101
|
Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int J Mol Sci 2021; 22:6032. [PMID: 34199677 PMCID: PMC8199773 DOI: 10.3390/ijms22116032] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 05/29/2021] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open
Abstract
The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins' 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug-target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery.
Collapse
Affiliation(s)
- Donghyuk Suh
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Jai Woo Lee
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Sun Choi
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
102
|
Biomedical Image Classification in a Big Data Architecture Using Machine Learning Algorithms. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:9998819. [PMID: 34122785 PMCID: PMC8191587 DOI: 10.1155/2021/9998819] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 05/09/2021] [Accepted: 05/25/2021] [Indexed: 12/13/2022]
Abstract
In modern-day medicine, medical imaging has undergone immense advancements and can capture several biomedical images from patients. In the wake of this, to assist medical specialists, these images can be used and trained in an intelligent system in order to aid the determination of the different diseases that can be identified from analyzing these images. Classification plays an important role in this regard; it enhances the grouping of these images into categories of diseases and optimizes the next step of a computer-aided diagnosis system. The concept of classification in machine learning deals with the problem of identifying to which set of categories a new population belongs. When category membership is known, the classification is done on the basis of a training set of data containing observations. The goal of this paper is to perform a survey of classification algorithms for biomedical images. The paper then describes how these algorithms can be applied to a big data architecture by using the Spark framework. This paper further proposes the classification workflow based on the observed optimal algorithms, Support Vector Machine and Deep Learning as drawn from the literature. The algorithm for the feature extraction step during the classification process is presented and can be customized in all other steps of the proposed classification workflow.
Collapse
|
103
|
Chandrasekaran S, Danos N, George UZ, Han JP, Quon G, Müller R, Tsang Y, Wolgemuth C. The Axes of Life: A roadmap for understanding dynamic multiscale systems. Integr Comp Biol 2021; 61:2011-2019. [PMID: 34048574 DOI: 10.1093/icb/icab114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The biological challenges facing humanity are complex, multi-factorial, and are intimately tied to the future of our health, welfare, and stewardship of the Earth. Tackling problems in diverse areas, such as agriculture, ecology, and health care require linking vast data sets that encompass numerous components and spatio-temporal scales. Here, we provide a new framework and a road map for using experiments and computation to understand dynamic biological systems that span multiple scales. We discuss theories that can help understand complex biological systems and highlight the limitations of existing methodologies and recommend data generation practices. The advent of new technologies such as big data analytics and artificial intelligence can help bridge different scales and data types. We recommend ways to make such models transparent, compatible with existing theories of biological function, and to make biological data sets readable by advanced machine learning algorithms. Overall, the barriers for tackling pressing biological challenges are not only technological, but also sociological. Hence, we also provide recommendations for promoting interdisciplinary interactions between scientists.
Collapse
Affiliation(s)
| | - Nicole Danos
- Department of Biology, University of San Diego, San Diego, CA, USA
| | - Uduak Z George
- Department of Mathematics & Statistics, San Diego State University, San Diego, CA, USA
| | - Jin-Ping Han
- IBM TJ Watson Research Center, Ossining, NY, USA
| | - Gerald Quon
- Department of Molecular and Cellular Biology, University of California-Davis, Davis, CA,USA
| | - Rolf Müller
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VI, USA
| | - Yinphan Tsang
- Department of Natural Resources and Environmental Management, University of Hawai'i at Mānoa, Honolulu, HI, USA
| | - Charles Wolgemuth
- Departments of Physics and Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
104
|
Pakhrin SC, Shrestha B, Adhikari B, KC DB. Deep Learning-Based Advances in Protein Structure Prediction. Int J Mol Sci 2021; 22:5553. [PMID: 34074028 PMCID: PMC8197379 DOI: 10.3390/ijms22115553] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/12/2021] [Accepted: 05/18/2021] [Indexed: 12/29/2022] Open
Abstract
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.
Collapse
Affiliation(s)
- Subash C. Pakhrin
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| | - Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Dukka B. KC
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| |
Collapse
|
105
|
Milchevskaya V, Nikitin AM, Lukshin SA, Filatov IV, Kravatsky YV, Tumanyan VG, Esipova NG, Milchevskiy YV. Structural coordinates: A novel approach to predict protein backbone conformation. PLoS One 2021; 16:e0239793. [PMID: 34014953 PMCID: PMC8136669 DOI: 10.1371/journal.pone.0239793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 04/14/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Local protein structure is usually described via classifying each peptide to a unique class from a set of pre-defined structures. These classifications may differ in the number of structural classes, the length of peptides, or class attribution criteria. Most methods that predict the local structure of a protein from its sequence first rely on some classification and only then proceed to the 3D conformation assessment. However, most classification methods rely on homologous proteins’ existence, unavoidably lose information by attributing a peptide to a single class or suffer from a suboptimal choice of the representative classes. Results To alleviate the above challenges, we propose a method that constructs a peptide’s structural representation from the sequence, reflecting its similarity to several basic representative structures. For 5-mer peptides and 16 representative structures, we achieved the Q16 classification accuracy of 67.9%, which is higher than what is currently reported in the literature. Our prediction method does not utilize information about protein homologues but relies only on the amino acids’ physicochemical properties and the resolved structures’ statistics. We also show that the 3D coordinates of a peptide can be uniquely recovered from its structural coordinates, and show the required conditions under various geometric constraints.
Collapse
Affiliation(s)
- Vladislava Milchevskaya
- Institute of Medical Statistics and Bioinformatics, Faculty of Medicine, University of Cologne, Cologne, Germany
- * E-mail: (VM); (YVM)
| | | | | | - Ivan V. Filatov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | | | | | - Yury V. Milchevskiy
- Engelhardt Institute of Molecular Biology, Moscow, Russia
- * E-mail: (VM); (YVM)
| |
Collapse
|
106
|
Remodelling structure-based drug design using machine learning. Emerg Top Life Sci 2021; 5:13-27. [PMID: 33825834 DOI: 10.1042/etls20200253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/17/2021] [Accepted: 03/30/2021] [Indexed: 12/13/2022]
Abstract
To keep up with the pace of rapid discoveries in biomedicine, a plethora of research endeavors had been directed toward Rational Drug Development that slowly gave way to Structure-Based Drug Design (SBDD). In the past few decades, SBDD played a stupendous role in identification of novel drug-like molecules that are capable of altering the structures and/or functions of the target macromolecules involved in different disease pathways and networks. Unfortunately, post-delivery drug failures due to adverse drug interactions have constrained the use of SBDD in biomedical applications. However, recent technological advancements, along with parallel surge in clinical research have led to the concomitant establishment of other powerful computational techniques such as Artificial Intelligence (AI) and Machine Learning (ML). These leading-edge tools with the ability to successfully predict side-effects of a wide range of drugs have eventually taken over the field of drug design. ML, a subset of AI, is a robust computational tool that is capable of data analysis and analytical model building with minimal human intervention. It is based on powerful algorithms that use huge sets of 'training data' as inputs to predict new output values, which improve iteratively through experience. In this review, along with a brief discussion on the evolution of the drug discovery process, we have focused on the methodologies pertaining to the technological advancements of machine learning. This review, with specific examples, also emphasises the tremendous contributions of ML in the field of biomedicine, while exploring possibilities for future developments.
Collapse
|
107
|
Vedithi SC, Malhotra S, Acebrón-García-de-Eulate M, Matusevicius M, Torres PHM, Blundell TL. Structure-Guided Computational Approaches to Unravel Druggable Proteomic Landscape of Mycobacterium leprae. Front Mol Biosci 2021; 8:663301. [PMID: 34026836 PMCID: PMC8138464 DOI: 10.3389/fmolb.2021.663301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open
Abstract
Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.
Collapse
Affiliation(s)
- Sundeep Chaitanya Vedithi
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Sundeep Chaitanya Vedithi,
| | - Sony Malhotra
- Rutherford Appleton Laboratory, Science and Technology Facilities Council, Oxon, United Kingdom
| | | | | | - Pedro Henrique Monteiro Torres
- Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,Tom L. Blundell,
| |
Collapse
|
108
|
Nallasamy V, S M. Bingham deep neural and oppositional fish swarm optimized protein structure prediction. J Biomol Struct Dyn 2021; 40:8706-8724. [PMID: 33955323 DOI: 10.1080/07391102.2021.1915181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
It is familiar that essential proteins take part in managing cellular activities in living organisms. Moreover, protein structure prediction from its amino acid sequence is advantageous to the comprehending of cellular functions. Formerly, several essential protein prediction methods have been proposed. However, those existing prediction methods were not satisfactory because to low sensitivity to imbalance characteristics. To address this issue, this paper presents a novel secondary protein structure prediction method, called, Bingham Deep Convolutional-based Oppositional Artificial Fish Optimized (BDC-OAFO). First, a protein structure identification framework, called, Bingham Distributed Deep Convolutional (BDDC) is designed to identify the essential proteins by eliminating the imbalanced learning issue. Next, secondary structure prediction framework, called, Oppositional Artificial Fish Swarm Optimization is proposed that obtain precise prediction results. Then, predicting secondary protein structure by emulating three biological behaviors of artificial fishes, including foraging behavior, following behavior, swarming behavior in which process, proximal count, oppositional function and Gaussian function are utilized. To evaluate the performance of BDC-OAFO method, we conduct experiments on Protein Data Bank dataset the experimental results show that our method BDC-OAFO achieves a better performance for identifying essential proteins and precise prediction in comparison with several other well-known prediction methods, which confirms the significance of BDC-OAFO.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Malarvizhi S
- Department of Computer Science, Thiruvalluvar Government Arts College, Namakkal, Tamil Nadu, India
| |
Collapse
|
109
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 158] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
110
|
One-Dimensional Convolutional Neural Network with Adaptive Moment Estimation for Modelling of the Sand Retention Test. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11093802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Stand-alone screens (SASs) are active sand control methods where compatible screens and slot sizes are selected through the sand retention test (SRT) to filter an unacceptable amount of sand produced from oil and gas wells. SRTs have been modelled in the laboratory using computer simulation to replicate experimental conditions and ensure that the selected screens are suitable for selected reservoirs. However, the SRT experimental setups and result analyses are not standardized. A few changes made to the experimental setup can cause a huge variation in results, leading to different plugging performance and sand retention analysis. Besides, conducting many laboratory experiments is expensive and time-consuming. Since the application of CNN in the petroleum industry attained promising results for both classification and regression problems, this method is proposed on SRT to reduce the time, cost, and effort to run the laboratory test by predicting the plugging performance and sand production. The application of deep learning has yet to be imposed in SRT. Therefore, in this study, a deep learning model using a one-dimensional convolutional neural network (1D-CNN) with adaptive moment estimation is developed to model the SRT with the aim of classifying plugging sign (screen plug, the screen does not plug) as well as to predict sand production and retained permeability using a varying sand distribution, SAS, screen slot size, and sand concentration as inputs. The performance of the proposed 1D-CNN model for the slurry test shows that the prediction of retained permeability and the classification of plugging sign achieved robust accuracy with more than a 90% value of R2, while the prediction of sand production achieved 77% accuracy. In addition, the model for the sand pack test achieved 84% accuracy in predicting sand production. For comparative model performance, gradient boosting (GB), K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM) were also modelled on the same datasets. The results showed that the proposed 1D-CNN model outperforms the other four machine learning models for both SRT tests in terms of prediction accuracy.
Collapse
|
111
|
Kyrilis FL, Belapure J, Kastritis PL. Detecting Protein Communities in Native Cell Extracts by Machine Learning: A Structural Biologist's Perspective. Front Mol Biosci 2021; 8:660542. [PMID: 33937337 PMCID: PMC8082361 DOI: 10.3389/fmolb.2021.660542] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/18/2021] [Indexed: 11/13/2022] Open
Abstract
Native cell extracts hold great promise for understanding the molecular structure of ordered biological systems at high resolution. This is because higher-order biomolecular interactions, dubbed as protein communities, may be retained in their (near-)native state, in contrast to extensively purifying or artificially overexpressing the proteins of interest. The distinct machine-learning approaches are applied to discover protein-protein interactions within cell extracts, reconstruct dedicated biological networks, and report on protein community members from various organisms. Their validation is also important, e.g., by the cross-linking mass spectrometry or cell biology methods. In addition, the cell extracts are amenable to structural analysis by cryo-electron microscopy (cryo-EM), but due to their inherent complexity, sorting structural signatures of protein communities derived by cryo-EM comprises a formidable task. The application of image-processing workflows inspired by machine-learning techniques would provide improvements in distinguishing structural signatures, correlating proteomic and network data to structural signatures and subsequently reconstructed cryo-EM maps, and, ultimately, characterizing unidentified protein communities at high resolution. In this review article, we summarize recent literature in detecting protein communities from native cell extracts and identify the remaining challenges and opportunities. We argue that the progress in, and the integration of, machine learning, cryo-EM, and complementary structural proteomics approaches would provide the basis for a multi-scale molecular description of protein communities within native cell extracts.
Collapse
Affiliation(s)
- Fotis L. Kyrilis
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Jaydeep Belapure
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Panagiotis L. Kastritis
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
- Biozentrum, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| |
Collapse
|
112
|
Billings WM, Morris CJ, Della Corte D. The whole is greater than its parts: ensembling improves protein contact prediction. Sci Rep 2021; 11:8039. [PMID: 33850214 PMCID: PMC8044223 DOI: 10.1038/s41598-021-87524-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 03/29/2021] [Indexed: 11/30/2022] Open
Abstract
The prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. We propose that a powerful concept from deep learning, called ensembling, can increase the accuracy of protein contact predictions by combining the outputs of different neural network models. We show that ensembling the predictions made by different groups at the recent Critical Assessment of Protein Structure Prediction (CASP13) outperforms all individual groups. Further, we show that contacts derived from the distance predictions of three additional deep neural networks-AlphaFold, trRosetta, and ProSPr-can be substantially improved by ensembling all three networks. We also show that ensembling these recent deep neural networks with the best CASP13 group creates a superior contact prediction tool. Finally, we demonstrate that two ensembled networks can successfully differentiate between the folds of two highly homologous sequences. In order to build further on these findings, we propose the creation of a better protein contact benchmark set and additional open-source contact prediction methods.
Collapse
Affiliation(s)
- Wendy M Billings
- Department of Physics and Astronomy, Brigham Young University, Provo, UT, USA
| | - Connor J Morris
- Department of Physics and Astronomy, Brigham Young University, Provo, UT, USA
| | - Dennis Della Corte
- Department of Physics and Astronomy, Brigham Young University, Provo, UT, USA.
| |
Collapse
|
113
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
114
|
Sabbih GO, Korsah MA, Jeevanandam J, Danquah MK. Biophysical analysis of SARS-CoV-2 transmission and theranostic development via N protein computational characterization. Biotechnol Prog 2021; 37:e3096. [PMID: 33118327 PMCID: PMC7645878 DOI: 10.1002/btpr.3096] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 10/22/2020] [Accepted: 10/26/2020] [Indexed: 01/01/2023]
Abstract
Recently, SARS-CoV-2 has been identified as the causative factor of viral infection called COVID-19 that belongs to the zoonotic beta coronavirus family known to cause respiratory disorders or viral pneumonia, followed by an extensive attack on organs that express angiotensin-converting enzyme II (ACE2). Human transmission of this virus occurs via respiratory droplets from symptomatic and asymptomatic patients, which are released into the environment after sneezing or coughing. These droplets are capable of staying in the air as aerosols or surfaces and can be transmitted to persons through inhalation or contact with contaminated surfaces. Thus, there is an urgent need for advanced theranostic solutions to control the spread of COVID-19 infection. The development of such fit-for-purpose technologies hinges on a proper understanding of the transmission, incubation, and structural characteristics of the virus in the external environment and within the host. Hence, this article describes the development of an intrinsic model to describe the incubation characteristics of the virus under varying environmental factors. It also discusses on the evaluation of SARS-CoV-2 structural nucleocapsid protein properties via computational approaches to generate high-affinity binding probes for effective diagnosis and targeted treatment applications by specific targeting of viruses. In addition, this article provides useful insights on the transmission behavior of the virus and creates new opportunities for theranostics development.
Collapse
Affiliation(s)
- Godfred O. Sabbih
- Department of Chemical EngineeringUniversity of TennesseeChattanoogaTennesseeUSA
| | - Maame A. Korsah
- Department of MathematicsUniversity of TennesseeChattanoogaTennesseeUSA
| | - Jaison Jeevanandam
- CQM ‐ Centro de Química da Madeira, MMRGUniversidade da Madeira, Campus da PenteadaFunchalPortugal
| | - Michael K. Danquah
- Department of Chemical EngineeringUniversity of TennesseeChattanoogaTennesseeUSA
| |
Collapse
|
115
|
Daley SK, Cordell GA. Natural Products, the Fourth Industrial Revolution, and the Quintuple Helix. Nat Prod Commun 2021. [DOI: 10.1177/1934578x211003029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The profound interconnectedness of the sciences and technologies embodied in the Fourth Industrial Revolution is discussed in terms of the global role of natural products, and how that interplays with the development of sustainable and climate-conscious practices of cyberecoethnopharmacolomics within the Quintuple Helix for the promotion of a healthier planet and society.
Collapse
Affiliation(s)
| | - Geoffrey A. Cordell
- Natural Products Inc., Evanston, IL, USA
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, FL, USA
| |
Collapse
|
116
|
Que-Salinas U, Ramírez-González PE, Torres-Carbajal A. Determination of thermodynamic state variables of liquids from their microscopic structures using an artificial neural network. SOFT MATTER 2021; 17:1975-1984. [PMID: 33427848 DOI: 10.1039/d0sm02127j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this work we implement a machine learning method to predict the thermodynamic state of a liquid using only its microscopic structure provided by the radial distribution function (RDF). The main goal is to determine the equation of state of the system. The goal is achieved by predicting the density, temperature or both at the same time using only the RDF. We implement and train a machine learning feed forward artificial neural network (ANN) to address the different cases of interest where single or simultaneous predictions are done. Due to its versatility, in this study the Lennard-Jones (LJ) fluid is used as the reference system. The ANN is trained in a wide range of densities and temperatures, covering the liquid-vapour coexistence, liquid phase and supercritical states. We show that the overall percentage relative error of most of the predictions in different cases of study is around 3%. As a practical case of study we use the ANN predictions to determine the pressure equation of state for different isotherms and we found a very good agreement with respect to the exact results. Our ANN implementation is a versatile and useful tool to predict thermodynamic state variables when some information is unknown and, consequently, to enhance the thermodynamic description of liquids.
Collapse
Affiliation(s)
- Ulices Que-Salinas
- Instituto de Física "Manuel Sandoval Vallarta", Universidad Autónoma de San Luis Potosí, Álvaro Obregón 64, 78000 San Luis Potosí, SLP, Mexico.
| | - Pedro E Ramírez-González
- CONACYT-Instituto de Física "Manuel Sandoval Vallarta", Universidad Autónoma de San Luis Potosí, Álvaro Obregón 64, 78000 San Luis Potosí, SLP, Mexico
| | - Alexis Torres-Carbajal
- Instituto de Física "Manuel Sandoval Vallarta", Universidad Autónoma de San Luis Potosí, Álvaro Obregón 64, 78000 San Luis Potosí, SLP, Mexico.
| |
Collapse
|
117
|
|
118
|
Katsimpouras C, Stephanopoulos G. Enzymes in biotechnology: Critical platform technologies for bioprocess development. Curr Opin Biotechnol 2021; 69:91-102. [PMID: 33422914 DOI: 10.1016/j.copbio.2020.12.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 11/09/2020] [Accepted: 12/08/2020] [Indexed: 01/02/2023]
Abstract
Enzymes are core elements of biosynthetic pathways employed in the synthesis of numerous bioproducts. Here, we review enzyme promiscuity, enzyme engineering, enzyme immobilization, and cell-free systems as fundamental strategies of bioprocess development. Initially, promiscuous enzymes are the first candidates in the quest for new activities to power new, artificial, or bypass pathways that expand substrate range and catalyze the production of new products. If the activity or regulation of available enzymes is unsuitable for a process, protein engineering can be applied to improve them to the required level. When cell toxicity and low productivity cannot be engineered away, cell-free systems are an attractive option, especially in combination with enzyme immobilization that allows extended enzyme use. Overall, the above methods support powerful platforms for bioprocess development and optimization.
Collapse
Affiliation(s)
- Constantinos Katsimpouras
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, 02139 MA, USA
| | - Gregory Stephanopoulos
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, 02139 MA, USA.
| |
Collapse
|
119
|
Runthala A. Probabilistic divergence of a template-based modelling methodology from the ideal protocol. J Mol Model 2021; 27:25. [PMID: 33411019 DOI: 10.1007/s00894-020-04640-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022]
Abstract
Protein structural information is essential for the detailed mapping of a functional protein network. For a higher modelling accuracy and quicker implementation, template-based algorithms have been extensively deployed and redefined. The methods only assess the predicted structure against its native state/template and do not estimate the accuracy for each modelling step. A divergence measure is therefore postulated to estimate the modelling accuracy against its theoretical optimal benchmark. By freezing the domain boundaries, the divergence measures are predicted for the most crucial steps of a modelling algorithm. To precisely refine the score using weighting constants, big data analysis could further be deployed.
Collapse
Affiliation(s)
- Ashish Runthala
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522502, India.
| |
Collapse
|
120
|
Dybowski R. Artificial Intelligence in Medicine: Biochemical 3D Modeling and Drug Discovery. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_318-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
121
|
Prihoda D, Maritz JM, Klempir O, Dzamba D, Woelk CH, Hazuda DJ, Bitton DA, Hannigan GD. The application potential of machine learning and genomics for understanding natural product diversity, chemistry, and therapeutic translatability. Nat Prod Rep 2021; 38:1100-1108. [PMID: 33245088 DOI: 10.1039/d0np00055h] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Covering: up to the end of 2020. The machine learning field can be defined as the study and application of algorithms that perform classification and prediction tasks through pattern recognition instead of explicitly defined rules. Among other areas, machine learning has excelled in natural language processing. As such methods have excelled at understanding written languages (e.g. English), they are also being applied to biological problems to better understand the "genomic language". In this review we focus on recent advances in applying machine learning to natural products and genomics, and how those advances are improving our understanding of natural product biology, chemistry, and drug discovery. We discuss machine learning applications in genome mining (identifying biosynthetic signatures in genomic data), predictions of what structures will be created from those genomic signatures, and the types of activity we might expect from those molecules. We further explore the application of these approaches to data derived from complex microbiomes, with a focus on the human microbiome. We also review challenges in leveraging machine learning approaches in the field, and how the availability of other "omics" data layers provides value. Finally, we provide insights into the challenges associated with interpreting machine learning models and the underlying biology and promises of applying machine learning to natural product drug discovery. We believe that the application of machine learning methods to natural product research is poised to accelerate the identification of new molecular entities that may be used to treat a variety of disease indications.
Collapse
Affiliation(s)
- David Prihoda
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic and Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, Czech Republic
| | - Julia M Maritz
- Exploratory Science Center, Merck & Co., Inc., Cambridge, MA, USA.
| | - Ondrej Klempir
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic
| | - David Dzamba
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic
| | | | - Daria J Hazuda
- Exploratory Science Center, Merck & Co., Inc., Cambridge, MA, USA.
| | - Danny A Bitton
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic
| | | |
Collapse
|
122
|
Huang TC, Fischer WB. Sequence–function correlation of the transmembrane domains in NS4B of HCV using a computational approach. AIMS BIOPHYSICS 2021. [DOI: 10.3934/biophy.2021013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
123
|
Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020; 1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence → structure → function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.
Collapse
Affiliation(s)
- Wenhao Gao
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
124
|
Nafees S, Rice SH, Wakeman CA. Analyzing genomic data using tensor-based orthogonal polynomials with application to synthetic RNAs. NAR Genom Bioinform 2020; 2:lqaa101. [PMID: 33575645 PMCID: PMC7731874 DOI: 10.1093/nargab/lqaa101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 11/06/2020] [Accepted: 11/27/2020] [Indexed: 02/06/2023] Open
Abstract
An important goal in molecular biology is to quantify both the patterns across a genomic sequence and the relationship between phenotype and underlying sequence. We propose a multivariate tensor-based orthogonal polynomial approach to characterize nucleotides or amino acids in a given sequence and map corresponding phenotypes onto the sequence space. We have applied this method to a previously published case of small transcription activating RNAs. Covariance patterns along the sequence showcased strong correlations between nucleotides at the ends of the sequence. However, when the phenotype is projected onto the sequence space, this pattern does not emerge. When doing second order analysis and quantifying the functional relationship between the phenotype and pairs of sites along the sequence, we identified sites with high regressions spread across the sequence, indicating potential intramolecular binding. In addition to quantifying interactions between different parts of a sequence, the method quantifies sequence–phenotype interactions at first and higher order levels. We discuss the strengths and constraints of the method and compare it to computational methods such as machine learning approaches. An accompanying command line tool to compute these polynomials is provided. We show proof of concept of this approach and demonstrate its potential application to other biological systems.
Collapse
Affiliation(s)
- Saba Nafees
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX 79409, USA
| | - Sean H Rice
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX 79409, USA
| | - Catherine A Wakeman
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX 79409, USA
| |
Collapse
|
125
|
Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020; 18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open
Abstract
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| |
Collapse
|
126
|
Skolnick J, Gao M. The role of local versus nonlocal physicochemical restraints in determining protein native structure. Curr Opin Struct Biol 2020; 68:1-8. [PMID: 33129066 DOI: 10.1016/j.sbi.2020.10.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/03/2020] [Accepted: 10/05/2020] [Indexed: 12/15/2022]
Abstract
The tertiary structure of a native protein is dictated by the interplay of local secondary structure propensities, hydrogen bonding, and tertiary interactions. It is argued that the space of known protein topologies covers all single domain folds and results from the compactness of the native structure and excluded volume. Protein compactness combined with the chirality of the protein's side chains also yields native-like Ramachandran plots. It is the many-body, tertiary interactions among residues that collectively select for the global structure that a particular protein sequence adopts. This explains why the recent advances in deep-learning approaches that predict protein side-chain contacts, the distance matrix between residues, and sequence alignments are successful. They succeed because they implicitly learned the many-body interactions among protein residues.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332, United States.
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332, United States.
| |
Collapse
|
127
|
de Brevern AG. Impact of protein dynamics on secondary structure prediction. Biochimie 2020; 179:14-22. [PMID: 32946990 DOI: 10.1016/j.biochi.2020.09.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 09/04/2020] [Accepted: 09/10/2020] [Indexed: 02/08/2023]
Abstract
Protein 3D structures support their biological functions. As the number of protein structures is negligible in regards to the number of available protein sequences, prediction methodologies relying only on protein sequences are essential tools. In this field, protein secondary structure prediction (PSSPs) is a mature area, and is considered to have reached a plateau. Nonetheless, proteins are highly dynamical macromolecules, a property that could impact the PSSP methods. Indeed, in a previous study, the stability of local protein conformations was evaluated demonstrating that some regions easily changed to another type of secondary structure. The protein sequences of this dataset were used by PSSPs and their results compared to molecular dynamics to investigate their potential impact on the quality of the secondary structure prediction. Interestingly, a direct link is observed between the quality of the prediction and the stability of the assignment to the secondary structure state. The more stable a local protein conformation is, the better the prediction will be. The secondary structure assignment not taken from the crystallized structures but from the conformations observed during the dynamics slightly increase the quality of the secondary structure prediction. These results show that evaluation of PSSPs can be done differently, but also that the notion of dynamics can be included in development of PSSPs and other approaches such as de novo approaches.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Biologie Intégrée Du Globule Rouge UMR_S1134, Inserm, Université de Paris, Univ. de la Réunion, Univ. des Antilles, F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739, Paris, France; IBL, F-75015, Paris, France.
| |
Collapse
|
128
|
Beg AZ, Khan AU. Motifs and interface amino acid-mediated regulation of amyloid biogenesis in microbes to humans: potential targets for intervention. Biophys Rev 2020; 12:1249-1256. [PMID: 32930961 DOI: 10.1007/s12551-020-00759-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 09/04/2020] [Indexed: 02/08/2023] Open
Abstract
Amyloids are linked to many debilitating diseases in mammals. Some organisms produce amyloids that have a functional role in the maintenance of their biological processes. Microbes utilize functional bacterial amyloids (FuBA) for pathogenicity and infections. Amyloid biogenesis is regulated differentially in various systems to avoid its toxic accumulation. A familiar feature in the process of amyloid biogenesis from humans to microbes is its regulation by protein-protein interactions (PPI). The spatial arrangement of amino acid residues in proteins generates topologies like flat interface and linear motif, which participate in protein interactions. Motifs and interface residue-mediated interactions have a direct or an indirect impact on amyloid secretion and assembly. Some motifs undergo post-translational modifications (PTM), which effects interactions and dynamics of the amyloid biogenesis cascade. Interaction-induced local changes stimulate global conformational transitions in the PPI complex, which indirectly affects amyloid formation. Perturbation of such motifs and interface residues results in amyloid abolishment. Interface residues, motifs and their respective interactive protein partners could serve as potential targets for intervention to inhibit amyloid biogenesis.
Collapse
Affiliation(s)
- Ayesha Z Beg
- Medical Microbiology and Molecular Biology, Interdisciplinary Biotechnology Unit, Aligarh Muslim University, Aligarh, 202002, India
| | - Asad U Khan
- Medical Microbiology and Molecular Biology, Interdisciplinary Biotechnology Unit, Aligarh Muslim University, Aligarh, 202002, India.
| |
Collapse
|
129
|
A New Method for Extracting Laver Culture Carriers Based on Inaccurate Supervised Classification with FCN-CRF. JOURNAL OF MARINE SCIENCE AND ENGINEERING 2020. [DOI: 10.3390/jmse8040274] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Timely monitoring of marine aquaculture has considerable significance for marine ecological protection and maritime safety and security. Considering that supervised learning needs to rely on a large number of training samples and the characteristics of intensive and regular distribution of the laver aquaculture zone, in this paper, an inaccurate supervised classification model based on fully convolutional neural network and conditional random filed (FCN-CRF) is designed for the study of a laver aquaculture zone in Lianyungang, Jiangsu Province. The proposed model can extract the aquaculture zone and calculate the area and quantity of laver aquaculture net simultaneously. The FCN is used to extract the laver aquaculture zone by roughly making the training label. Then, the CRF is used to extract the isolated laver aquaculture net with high precision. The results show that the k a p p a coefficient of the proposed model is 0.984, the F 1 is 0.99, and the recognition effect is outstanding. For label production, the fault tolerance rate is high and does not affect the final classification accuracy, thereby saving more label production time. The findings provide a data basis for future aquaculture yield estimation and offshore resource planning as well as technical support for marine ecological supervision and marine traffic management.
Collapse
|
130
|
Kabir S, Islam RU, Hossain MS, Andersson K. An Integrated Approach of Belief Rule Base and Deep Learning to Predict Air Pollution. SENSORS (BASEL, SWITZERLAND) 2020; 20:E1956. [PMID: 32244380 PMCID: PMC7181062 DOI: 10.3390/s20071956] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 03/25/2020] [Accepted: 03/27/2020] [Indexed: 11/16/2022]
Abstract
Sensor data are gaining increasing global attention due to the advent of Internet of Things (IoT). Reasoning is applied on such sensor data in order to compute prediction. Generating a health warning that is based on prediction of atmospheric pollution, planning timely evacuation of people from vulnerable areas with respect to prediction of natural disasters, etc., are the use cases of sensor data stream where prediction is vital to protect people and assets. Thus, prediction accuracy is of paramount importance to take preventive steps and avert any untoward situation. Uncertainties of sensor data is a severe factor which hampers prediction accuracy. Belief Rule Based Expert System (BRBES), a knowledge-driven approach, is a widely employed prediction algorithm to deal with such uncertainties based on knowledge base and inference engine. In connection with handling uncertainties, it offers higher accuracy than other such knowledge-driven techniques, e.g., fuzzy logic and Bayesian probability theory. Contrarily, Deep Learning is a data-driven technique, which constitutes a part of Artificial Intelligence (AI). By applying analytics on huge amount of data, Deep Learning learns the hidden representation of data. Thus, Deep Learning can infer prediction by reasoning over available data, such as historical data and sensor data streams. Combined application of BRBES and Deep Learning can compute prediction with improved accuracy by addressing sensor data uncertainties while utilizing its discovered data pattern. Hence, this paper proposes a novel predictive model that is based on the integrated approach of BRBES and Deep Learning. The uniqueness of this model lies in the development of a mathematical model to combine Deep Learning with BRBES and capture the nonlinear dependencies among the relevant variables. We optimized BRBES further by applying parameter and structure optimization on it. Air pollution prediction has been taken as use case of our proposed combined approach. This model has been evaluated against two different datasets. One dataset contains synthetic images with a corresponding label of PM2.5 concentrations. The other one contains real images, PM2.5 concentrations, and numerical weather data of Shanghai, China. We also distinguished a hazy image between polluted air and fog through our proposed model. Our approach has outperformed only BRBES and only Deep Learning in terms of prediction accuracy.
Collapse
Affiliation(s)
- Sami Kabir
- Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, SE-931 87 Skellefteå, Sweden
| | - Raihan Ul Islam
- Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, SE-931 87 Skellefteå, Sweden
| | | | - Karl Andersson
- Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, SE-931 87 Skellefteå, Sweden
| |
Collapse
|
131
|
Development of an Artificial Intelligence Powered TIG Welding Algorithm for the Prediction of Bead Geometry for TIG Welding Processes using Hybrid Deep Learning. METALS 2020. [DOI: 10.3390/met10040451] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recent developments in artificial intelligence (AI) modeling tools allows for envisaging that AI will remove elements of human mechanical effort from welding operations. This paper contributes to this development by proposing an AI tungsten inert gas (TIG) welding algorithm that can assist human welders to select desirable end factors to achieve good weld quality in the welding process. To demonstrate its feasibility, the proposed model has been tested with data from 27 experiments using current, arc length and welding speed as control parameters to predict weld bead width. A fuzzy deep neural network, which is a combination of fuzzy logic and deep neural network approaches, is applied in the algorithm. Simulations were carried out on an experimental test dataset with the AI TIG welding algorithm. The results showed 92.59% predictive accuracy (25 out of 27 correct answers) as compared to the results from the experiment. The performance of the algorithm at this nascent stage demonstrates the feasibility of the proposed method. This performance shows that in future work, if its predictive accuracy is improved with human input and more data, it could achieve the level of accuracy that could support the human welder in the field to enhance efficiency in the welding process. The findings are useful for industries that are in the welding trade and serve as an educational tool.
Collapse
|
132
|
Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models. SUSTAINABILITY 2020. [DOI: 10.3390/su12062570] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Many countries worldwide have poor air quality due to the emission of particulate matter (i.e., PM10 and PM2.5), which has led to concerns about human health impacts in urban areas. In this study, we developed models to predict fine PM concentrations using long short-term memory (LSTM) and deep autoencoder (DAE) methods, and compared the model results in terms of root mean square error (RMSE). We applied the models to hourly air quality data from 25 stations in Seoul, South Korea, for the period from 1 January 2015, to 31 December 2018. Fine PM concentrations were predicted for the 10 days following this period, at an optimal learning rate of 0.01 for 100 epochs with batch sizes of 32 for LSTM model, and DAEs model performed best with batch size 64. The proposed models effectively predicted fine PM concentrations, with the LSTM model showing slightly better performance. With our forecasting model, it is possible to give reliable fine dust prediction information for the area where the user is located.
Collapse
|