1
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Bhat V, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet 2024:10.1038/s41588-024-01726-6. [PMID: 38658794 DOI: 10.1038/s41588-024-01726-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 03/21/2024] [Indexed: 04/26/2024]
Abstract
CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Martin Jankowiak
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Vineel Bhat
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, Quebec, Canada
- Faculté de Médecine, Université de Montréal, Montréal, Quebec, Canada
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A Cassa
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Luca Pinello
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.
- Gene Regulation Observatory, The Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Pathology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Myung Y, de Sá AGC, Ascher DB. Deep-PK: deep learning for small molecule pharmacokinetic and toxicity prediction. Nucleic Acids Res 2024:gkae254. [PMID: 38634808 DOI: 10.1093/nar/gkae254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 03/20/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
Evaluating pharmacokinetic properties of small molecules is considered a key feature in most drug development and high-throughput screening processes. Generally, pharmacokinetics, which represent the fate of drugs in the human body, are described from four perspectives: absorption, distribution, metabolism and excretion-all of which are closely related to a fifth perspective, toxicity (ADMET). Since obtaining ADMET data from in vitro, in vivo or pre-clinical stages is time consuming and expensive, many efforts have been made to predict ADMET properties via computational approaches. However, the majority of available methods are limited in their ability to provide pharmacokinetics and toxicity for diverse targets, ensure good overall accuracy, and offer ease of use, interpretability and extensibility for further optimizations. Here, we introduce Deep-PK, a deep learning-based pharmacokinetic and toxicity prediction, analysis and optimization platform. We applied graph neural networks and graph-based signatures as a graph-level feature to yield the best predictive performance across 73 endpoints, including 64 ADMET and 9 general properties. With these powerful models, Deep-PK supports molecular optimization and interpretation, aiding users in optimizing and understanding pharmacokinetics and toxicity for given input molecules. The Deep-PK is freely available at https://biosig.lab.uq.edu.au/deeppk/.
Collapse
Affiliation(s)
- Yoochan Myung
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
3
|
Soh CH, de Sá AGC, Potter E, Halabi A, Ascher DB, Marwick TH. Use of the energy waveform electrocardiogram to detect subclinical left ventricular dysfunction in patients with type 2 diabetes mellitus. Cardiovasc Diabetol 2024; 23:91. [PMID: 38448993 PMCID: PMC10918872 DOI: 10.1186/s12933-024-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 01/22/2024] [Indexed: 03/08/2024] Open
Abstract
BACKGROUND Recent guidelines propose N-terminal pro-B-type natriuretic peptide (NT-proBNP) for recognition of asymptomatic left ventricular (LV) dysfunction (Stage B Heart Failure, SBHF) in type 2 diabetes mellitus (T2DM). Wavelet Transform based signal-processing transforms electrocardiogram (ECG) waveforms into an energy distribution waveform (ew)ECG, providing frequency and energy features that machine learning can use as additional inputs to improve the identification of SBHF. Accordingly, we sought whether machine learning model based on ewECG features was superior to NT-proBNP, as well as a conventional screening tool-the Atherosclerosis Risk in Communities (ARIC) HF risk score, in SBHF screening among patients with T2DM. METHODS Participants in two clinical trials of SBHF (defined as diastolic dysfunction [DD], reduced global longitudinal strain [GLS ≤ 18%] or LV hypertrophy [LVH]) in T2DM underwent 12-lead ECG with additional ewECG feature and echocardiography. Supervised machine learning was adopted to identify the optimal combination of ewECG extracted features for SBHF screening in 178 participants in one trial and tested in 97 participants in the other trial. The accuracy of the ewECG model in SBHF screening was compared with NT-proBNP and ARIC HF. RESULTS SBHF was identified in 128 (72%) participants in the training dataset (median 72 years, 41% female) and 64 (66%) in the validation dataset (median 70 years, 43% female). Fifteen ewECG features showed an area under the curve (AUC) of 0.81 (95% CI 0.787-0.794) in identifying SBHF, significantly better than both NT-proBNP (AUC 0.56, 95% CI 0.44-0.68, p < 0.001) and ARIC HF (AUC 0.67, 95%CI 0.56-0.79, p = 0.002). ewECG features were also led to robust models screening for DD (AUC 0.74, 95% CI 0.73-0.74), reduced GLS (AUC 0.76, 95% CI 0.73-0.74) and LVH (AUC 0.90, 95% CI 0.88-0.89). CONCLUSIONS Machine learning based modelling using additional ewECG extracted features are superior to NT-proBNP and ARIC HF in SBHF screening among patients with T2DM, providing an alternative HF screening strategy for asymptomatic patients and potentially act as a guidance tool to determine those who required echocardiogram to confirm diagnosis. Trial registration LEAVE-DM, ACTRN 12619001393145 and Vic-ELF, ACTRN 12617000116325.
Collapse
Affiliation(s)
- Cheng Hwee Soh
- Imaging Research Laboratory, Baker Heart and Diabetes Institute, PO Box 6492, Melbourne, VIC, 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Australia
| | - Alex G C de Sá
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Australia
- Systems and Computational Biology, Bio21 Institute, Parkville, Australia
| | - Elizabeth Potter
- Imaging Research Laboratory, Baker Heart and Diabetes Institute, PO Box 6492, Melbourne, VIC, 3004, Australia
| | - Amera Halabi
- Imaging Research Laboratory, Baker Heart and Diabetes Institute, PO Box 6492, Melbourne, VIC, 3004, Australia
| | - David B Ascher
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Australia
- Systems and Computational Biology, Bio21 Institute, Parkville, Australia
| | - Thomas H Marwick
- Imaging Research Laboratory, Baker Heart and Diabetes Institute, PO Box 6492, Melbourne, VIC, 3004, Australia.
- Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Australia.
- Menzies Institute for Medical Research, Hobart, Australia.
| |
Collapse
|
4
|
Szot JO, Cuny H, Martin EM, Sheng DZ, Iyer K, Portelli S, Nguyen V, Gereis JM, Alankarage D, Chitayat D, Chong K, Wentzensen IM, Vincent-Delormé C, Lermine A, Burkitt-Wright E, Ji W, Jeffries L, Pais LS, Tan TY, Pitt J, Wise CA, Wright H, Andrews ID, Pruniski B, Grebe TA, Corsten-Janssen N, Bouman K, Poulton C, Prakash S, Keren B, Brown NJ, Hunter MF, Heath O, Lakhani SA, McDermott JH, Ascher DB, Chapman G, Bozon K, Dunwoodie SL. A metabolic signature for NADSYN1-dependent congenital NAD deficiency disorder. J Clin Invest 2024; 134:e174824. [PMID: 38357931 PMCID: PMC10866660 DOI: 10.1172/jci174824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 12/20/2023] [Indexed: 02/16/2024] Open
Abstract
Nicotinamide adenine dinucleotide (NAD) is essential for embryonic development. To date, biallelic loss-of-function variants in 3 genes encoding nonredundant enzymes of the NAD de novo synthesis pathway - KYNU, HAAO, and NADSYN1 - have been identified in humans with congenital malformations defined as congenital NAD deficiency disorder (CNDD). Here, we identified 13 further individuals with biallelic NADSYN1 variants predicted to be damaging, and phenotypes ranging from multiple severe malformations to the complete absence of malformation. Enzymatic assessment of variant deleteriousness in vitro revealed protein domain-specific perturbation, complemented by protein structure modeling in silico. We reproduced NADSYN1-dependent CNDD in mice and assessed various maternal NAD precursor supplementation strategies to prevent adverse pregnancy outcomes. While for Nadsyn1+/- mothers, any B3 vitamer was suitable to raise NAD, preventing embryo loss and malformation, Nadsyn1-/- mothers required supplementation with amidated NAD precursors (nicotinamide or nicotinamide mononucleotide) bypassing their metabolic block. The circulatory NAD metabolome in mice and humans before and after NAD precursor supplementation revealed a consistent metabolic signature with utility for patient identification. Our data collectively improve clinical diagnostics of NADSYN1-dependent CNDD, provide guidance for the therapeutic prevention of CNDD, and suggest an ongoing need to maintain NAD levels via amidated NAD precursor supplementation after birth.
Collapse
Affiliation(s)
- Justin O. Szot
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
| | - Hartmut Cuny
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, Sydney, New South Wales, Australia
| | - Ella M.M.A. Martin
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
| | - Delicia Z. Sheng
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
| | - Kavitha Iyer
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Vivien Nguyen
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
| | - Jessica M. Gereis
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
| | - Dimuthu Alankarage
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
| | - David Chitayat
- Department of Pediatrics, Division of Clinical and Metabolic Genetics, The Hospital for Sick Children, and
- Prenatal Diagnosis and Medical Genetics Program, Department of Obstetrics and Gynecology, Mount Sinai Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Karen Chong
- Prenatal Diagnosis and Medical Genetics Program, Department of Obstetrics and Gynecology, Mount Sinai Hospital, University of Toronto, Toronto, Ontario, Canada
| | | | | | - Alban Lermine
- Laboratoire de Biologie Médicale Multisites SeqOIA, FMG2025, Paris, France
| | - Emma Burkitt-Wright
- Manchester Centre for Genomic Medicine, St. Mary’s Hospital, Manchester University Hospitals NHS Foundation Trust, Manchester, United Kingdom
| | - Weizhen Ji
- Yale University School of Medicine, Pediatric Genomics Discovery Program, New Haven, Connecticut, USA
| | - Lauren Jeffries
- Yale University School of Medicine, Pediatric Genomics Discovery Program, New Haven, Connecticut, USA
| | - Lynn S. Pais
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Tiong Y. Tan
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
| | - James Pitt
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
- Metabolic Laboratory, Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
| | - Cheryl A. Wise
- Department of Diagnostic Genomics, PathWest Laboratory Medicine Western Australia, Nedlands, Perth, Western Australia, Australia
| | - Helen Wright
- General Paediatric Department, Perth Children’s Hospital, Perth, Western Australia, Australia
- Rural Clinical School, University of Western Australia, Perth, Western Australia, Australia
| | | | - Brianna Pruniski
- Division of Genetics and Metabolism, Phoenix Children’s Hospital, Phoenix, Arizona, USA
| | - Theresa A. Grebe
- Division of Genetics and Metabolism, Phoenix Children’s Hospital, Phoenix, Arizona, USA
| | - Nicole Corsten-Janssen
- Department of Genetics, University Medical Centre Groningen, University of Groningen, Groningen, Netherlands
| | - Katelijne Bouman
- Department of Genetics, University Medical Centre Groningen, University of Groningen, Groningen, Netherlands
| | - Cathryn Poulton
- Genetic Services of Western Australia, King Edward Memorial Hospital, Perth, Western Australia, Australia
| | - Supraja Prakash
- Division of Genetics and Metabolism, Phoenix Children’s Hospital, Phoenix, Arizona, USA
| | - Boris Keren
- Département de Génétique, Groupe Hospitalier Pitié-Salpêtrière, Assistance Publique – Hôpitaux de Paris, Sorbonne Université, Paris, France
| | - Natasha J. Brown
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
| | - Matthew F. Hunter
- Monash Genetics, Monash Health, Clayton, Victoria, Australia
- Department of Paediatrics, Monash University, Clayton, Victoria, Australia
| | - Oliver Heath
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
- Department of Metabolic Medicine, The Royal Children’s Hospital, Melbourne, Victoria, Australia
| | - Saquib A. Lakhani
- Yale University School of Medicine, Pediatric Genomics Discovery Program, New Haven, Connecticut, USA
| | - John H. McDermott
- Manchester Centre for Genomic Medicine, St. Mary’s Hospital, Manchester University Hospitals NHS Foundation Trust, Manchester, United Kingdom
- Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, United Kingdom
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Gavin Chapman
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, Sydney, New South Wales, Australia
| | - Kayleigh Bozon
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
| | - Sally L. Dunwoodie
- Victor Chang Cardiac Research Institute, Darlinghurst, Sydney, New South Wales, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, Sydney, New South Wales, Australia
- Faculty of Science, University of New South Wales, Sydney, New South Wales, Australia
| |
Collapse
|
5
|
Velloso JPL, Kovacs AS, Pires DEV, Ascher DB. AI-driven GPCR analysis, engineering, and targeting. Curr Opin Pharmacol 2024; 74:102427. [PMID: 38219398 DOI: 10.1016/j.coph.2023.102427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/16/2024]
Abstract
This article investigates the role of recent advances in Artificial Intelligence (AI) to revolutionise the study of G protein-coupled receptors (GPCRs). AI has been applied to many areas of GPCR research, including the application of machine learning (ML) in GPCR classification, prediction of GPCR activation levels, modelling GPCR 3D structures and interactions, understanding G-protein selectivity, aiding elucidation of GPCRs structures, and drug design. Despite progress, challenges in predicting GPCR structures and addressing the complex nature of GPCRs remain, providing avenues for future research and development.
Collapse
Affiliation(s)
- João P L Velloso
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Aaron S Kovacs
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia.
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
6
|
Serghini A, Portelli S, Troadec G, Song C, Pan Q, Pires DEV, Ascher DB. Characterizing and predicting ccRCC-causing missense mutations in Von Hippel-Lindau disease. Hum Mol Genet 2024; 33:224-232. [PMID: 37883464 PMCID: PMC10800015 DOI: 10.1093/hmg/ddad181] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 10/19/2023] [Accepted: 10/20/2023] [Indexed: 10/28/2023] Open
Abstract
BACKGROUND Mutations within the Von Hippel-Lindau (VHL) tumor suppressor gene are known to cause VHL disease, which is characterized by the formation of cysts and tumors in multiple organs of the body, particularly clear cell renal cell carcinoma (ccRCC). A major challenge in clinical practice is determining tumor risk from a given mutation in the VHL gene. Previous efforts have been hindered by limited available clinical data and technological constraints. METHODS To overcome this, we initially manually curated the largest set of clinically validated VHL mutations to date, enabling a robust assessment of existing predictive tools on an independent test set. Additionally, we comprehensively characterized the effects of mutations within VHL using in silico biophysical tools describing changes in protein stability, dynamics and affinity to binding partners to provide insights into the structure-phenotype relationship. These descriptive properties were used as molecular features for the construction of a machine learning model, designed to predict the risk of ccRCC development as a result of a VHL missense mutation. RESULTS Analysis of our model showed an accuracy of 0.81 in the identification of ccRCC-causing missense mutations, and a Matthew's Correlation Coefficient of 0.44 on a non-redundant blind test, a significant improvement in comparison to the previous available approaches. CONCLUSION This work highlights the power of using protein 3D structure to fully explore the range of molecular and functional consequences of genomic variants. We believe this optimized model will better enable its clinical implementation and assist guiding patient risk stratification and management.
Collapse
Affiliation(s)
- Adam Serghini
- School of Chemistry and Molecular Biosciences, Chemistry Building 68, Cooper Road, The University of Queensland, St Lucia, QLD 4072, Queensland, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, Chemistry Building 68, Cooper Road, The University of Queensland, St Lucia, QLD 4072, Queensland, Australia
| | - Guillaume Troadec
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Catherine Song
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Biosciences, Chemistry Building 68, Cooper Road, The University of Queensland, St Lucia, QLD 4072, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, Chemistry Building 68, Cooper Road, The University of Queensland, St Lucia, QLD 4072, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| |
Collapse
|
7
|
Rodrigues CHM, Portelli S, Ascher DB. Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges. Hum Genet 2024:10.1007/s00439-023-02623-4. [PMID: 38227011 DOI: 10.1007/s00439-023-02623-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/18/2023] [Indexed: 01/17/2024]
Abstract
Missense mutations are known contributors to diverse genetic disorders, due to their subtle, single amino acid changes imparted on the resultant protein. Because of this, understanding the impact of these mutations on protein stability and function is crucial for unravelling disease mechanisms and developing targeted therapies. The Critical Assessment of Genome Interpretation (CAGI) provides a valuable platform for benchmarking state-of-the-art computational methods in predicting the impact of disease-related mutations on protein thermodynamics. Here we report the performance of our comprehensive platform of structure-based computational approaches to evaluate mutations impacting protein structure and function on 3 challenges from CAGI6: Calmodulin, MAPK1 and MAPK3. Our stability predictors have achieved correlations of up to 0.74 and AUCs of 1 when predicting changes in ΔΔG for MAPK1 and MAPK3, respectively, and AUC of up to 0.75 in the Calmodulin challenge. Overall, our study highlights the importance of structure-based approaches in understanding the effects of missense mutations on protein thermodynamics. The results obtained from the CAGI6 challenges contribute to the ongoing efforts to enhance our understanding of disease mechanisms and facilitate the development of personalised medicine approaches.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia
| | - Stephanie Portelli
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia.
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia.
| |
Collapse
|
8
|
Li J, Mui JWY, da Silva BM, Pires DEV, Ascher DB, Madiedo Soler N, Goddard-Borger ED, Williams SJ. A Broad-Spectrum α-Glucosidase of Glycoside Hydrolase Family 13 from Marinovum sp., a Member of the Roseobacter Clade. Appl Biochem Biotechnol 2024:10.1007/s12010-023-04820-3. [PMID: 38180643 DOI: 10.1007/s12010-023-04820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/19/2023] [Indexed: 01/06/2024]
Abstract
Glycoside hydrolases (GHs) are a diverse group of enzymes that catalyze the hydrolysis of glycosidic bonds. The Carbohydrate-Active enZymes (CAZy) classification organizes GHs into families based on sequence data and function, with fewer than 1% of the predicted proteins characterized biochemically. Consideration of genomic context can provide clues to infer possible enzyme activities for proteins of unknown function. We used the MultiGeneBLAST tool to discover a gene cluster in Marinovum sp., a member of the marine Roseobacter clade, that encodes homologues of enzymes belonging to the sulfoquinovose monooxygenase pathway for sulfosugar catabolism. This cluster lacks a gene encoding a classical family GH31 sulfoquinovosidase candidate, but which instead includes an uncharacterized family GH13 protein (MsGH13) that we hypothesized could be a non-classical sulfoquinovosidase. Surprisingly, recombinant MsGH13 lacks sulfoquinovosidase activity and is a broad-spectrum α-glucosidase that is active on a diverse array of α-linked disaccharides, including maltose, sucrose, nigerose, trehalose, isomaltose, and kojibiose. Using AlphaFold, a 3D model for the MsGH13 enzyme was constructed that predicted its active site shared close similarity with an α-glucosidase from Halomonas sp. H11 of the same GH13 subfamily that shows narrower substrate specificity.
Collapse
Affiliation(s)
- Jinling Li
- School of Chemistry and Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Janice W-Y Mui
- School of Chemistry and Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Bruna M da Silva
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, 3010, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, 4072, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, 4072, Australia
| | - Niccolay Madiedo Soler
- ACRF Chemical Biology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Victoria, 3052, Australia
| | - Ethan D Goddard-Borger
- ACRF Chemical Biology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia
- Department of Medical Biology, University of Melbourne, Parkville, Victoria, 3052, Australia
| | - Spencer J Williams
- School of Chemistry and Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, 3010, Australia.
| |
Collapse
|
9
|
Abstract
The greatest challenge in drug discovery remains the high rate of attrition across the different phases of the process, which cost the industry billions of dollars every year. While all phases remain crucial to ensure pharmaceutical-level safety, quality, and efficacy of the end product, streamlining these efforts toward compounds with success potential is pivotal for a more efficient and cost-effective process. The use of artificial intelligence (AI) within the pharmaceutical industry aims at just this, and has applications in preclinical screening for biological activity, optimization of pharmacokinetic properties for improved drug formulation, early toxicity prediction which reduces attrition, and pre-emptively screening for genetic changes in the biological target to improve therapeutic longevity. Here, we present a series of in silico tools that address these applications in small molecule development and describe how they can be embedded within the current pharmaceutical development pipeline.
Collapse
Affiliation(s)
- Adam Serghini
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia.
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
| |
Collapse
|
10
|
Pan Q, Portelli S, Nguyen TB, Ascher DB. Characterization on the oncogenic effect of the missense mutations of p53 via machine learning. Brief Bioinform 2023; 25:bbad428. [PMID: 38018912 PMCID: PMC10685404 DOI: 10.1093/bib/bbad428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
Dysfunctions caused by missense mutations in the tumour suppressor p53 have been extensively shown to be a leading driver of many cancers. Unfortunately, it is time-consuming and labour-intensive to experimentally elucidate the effects of all possible missense variants. Recent works presented a comprehensive dataset and machine learning model to predict the functional outcome of mutations in p53. Despite the well-established dataset and precise predictions, this tool was trained on a complicated model with limited predictions on p53 mutations. In this work, we first used computational biophysical tools to investigate the functional consequences of missense mutations in p53, informing a bias of deleterious mutations with destabilizing effects. Combining these insights with experimental assays, we present two interpretable machine learning models leveraging both experimental assays and in silico biophysical measurements to accurately predict the functional consequences on p53 and validate their robustness on clinical data. Our final model based on nine features obtained comparable predictive performance with the state-of-the-art p53 specific method and outperformed other generalized, widely used predictors. Interpreting our models revealed that information on residue p53 activity, polar atom distances and changes in p53 stability were instrumental in the decisions, consistent with a bias of the properties of deleterious mutations. Our predictions have been computed for all possible missense mutations in p53, offering clinical diagnostic utility, which is crucial for patient monitoring and the development of personalized cancer treatment.
Collapse
Affiliation(s)
- Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Thanh Binh Nguyen
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| |
Collapse
|
11
|
Rodrigues CHM, Ascher DB. CSM-Potential2: A comprehensive deep learning platform for the analysis of protein interacting interfaces. Proteins 2023. [PMID: 37870486 DOI: 10.1002/prot.26615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 10/24/2023]
Abstract
Proteins are molecular machinery that participate in virtually all essential biological functions within the cell, which are tightly related to their 3D structure. The importance of understanding protein structure-function relationship is highlighted by the exponential growth of experimental structures, which has been greatly expanded by recent breakthroughs in protein structure prediction, most notably RosettaFold, and AlphaFold2. These advances have prompted the development of several computational approaches that leverage these data sources to explore potential biological interactions. However, most methods are generally limited to analysis of single types of interactions, such as protein-protein or protein-ligand interactions, and their complexity limits the usability to expert users. Here we report CSM-Potential2, a deep learning platform for the analysis of binding interfaces on protein structures. In addition to prediction of protein-protein interactions binding sites and classification of biological ligands, our new platform incorporates prediction of interactions with nucleic acids at the residue level and allows for ligand transplantation based on sequence and structure similarity to experimentally determined structures. We anticipate our platform to be a valuable resource that provides easy access to a range of state-of-the-art methods to expert and non-expert users for the study of biological interactions. Our tool is freely available as an easy-to-use web server and API available at https://biosig.lab.uq.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
12
|
Al-Jarf R, Karmakar M, Myung Y, Ascher DB. Uncovering the Molecular Drivers of NHEJ DNA Repair-Implicated Missense Variants and Their Functional Consequences. Genes (Basel) 2023; 14:1890. [PMID: 37895239 PMCID: PMC10606680 DOI: 10.3390/genes14101890] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 09/24/2023] [Accepted: 09/27/2023] [Indexed: 10/29/2023] Open
Abstract
Variants in non-homologous end joining (NHEJ) DNA repair genes are associated with various human syndromes, including microcephaly, growth delay, Fanconi anemia, and different hereditary cancers. However, very little has been done previously to systematically record the underlying molecular consequences of NHEJ variants and their link to phenotypic outcomes. In this study, a list of over 2983 missense variants of the principal components of the NHEJ system, including DNA Ligase IV, DNA-PKcs, Ku70/80 and XRCC4, reported in the clinical literature, was initially collected. The molecular consequences of variants were evaluated using in silico biophysical tools to quantitatively assess their impact on protein folding, dynamics, stability, and interactions. Cancer-causing and population variants within these NHEJ factors were statistically analyzed to identify molecular drivers. A comprehensive catalog of NHEJ variants from genes known to be mutated in cancer was curated, providing a resource for better understanding their role and molecular mechanisms in diseases. The variant analysis highlighted different molecular drivers among the distinct proteins, where cancer-driving variants in anchor proteins, such as Ku70/80, were more likely to affect key protein-protein interactions, whilst those in the enzymatic components, such as DNA-PKcs, were likely to be found in intolerant regions undergoing purifying selection. We believe that the information acquired in our database will be a powerful resource to better understand the role of non-homologous end-joining DNA repair in genetic disorders, and will serve as a source to inspire other investigations to understand the disease further, vital for the development of improved therapeutic strategies.
Collapse
Affiliation(s)
- Raghad Al-Jarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville, VIC 3052, Australia (M.K.)
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Malancha Karmakar
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville, VIC 3052, Australia (M.K.)
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville, VIC 3052, Australia (M.K.)
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St. Lucia, QLD 4072, Australia
| | - David B. Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville, VIC 3052, Australia (M.K.)
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St. Lucia, QLD 4072, Australia
| |
Collapse
|
13
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. medRxiv 2023:2023.09.08.23295253. [PMID: 37732177 PMCID: PMC10508837 DOI: 10.1101/2023.09.08.23295253] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I. Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, QC H1T 1C8, Canada
- Faculté de Médecine, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A. Cassa
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard I. Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
14
|
Portelli S, Heaton R, Ascher DB. Identifying Innate Resistance Hotspots for SARS-CoV-2 Antivirals Using In Silico Protein Techniques. Genes (Basel) 2023; 14:1699. [PMID: 37761839 PMCID: PMC10531314 DOI: 10.3390/genes14091699] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 08/02/2023] [Accepted: 08/22/2023] [Indexed: 09/29/2023] Open
Abstract
The development and approval of antivirals against SARS-CoV-2 has further equipped clinicians with treatment strategies against the COVID-19 pandemic, reducing deaths post-infection. Extensive clinical use of antivirals, however, can impart additional selective pressure, leading to the emergence of antiviral resistance. While we have previously characterized possible effects of circulating SARS-CoV-2 missense mutations on proteome function and stability, their direct effects on the novel antivirals remains unexplored. To address this, we have computationally calculated the consequences of mutations in the antiviral targets: RNA-dependent RNA polymerase and main protease, on target stability and interactions with their antiviral, nucleic acids, and other proteins. By analyzing circulating variants prior to antiviral approval, this work highlighted the inherent resistance potential of different genome regions. Namely, within the main protease binding site, missense mutations imparted a lower fitness cost, while the opposite was noted for the RNA-dependent RNA polymerase binding site. This suggests that resistance to nirmatrelvir/ritonavir combination treatment is more likely to occur and proliferate than that to molnupiravir. These insights are crucial both clinically in drug stewardship, and preclinically in the identification of less mutable targets for novel therapeutic design.
Collapse
Affiliation(s)
- Stephanie Portelli
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia
- Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| | - Ruby Heaton
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia
- Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| |
Collapse
|
15
|
Myung Y, Pires DEV, Ascher DB. Understanding the complementarity and plasticity of antibody-antigen interfaces. Bioinformatics 2023:btad392. [PMID: 37382557 DOI: 10.1093/bioinformatics/btad392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 01/24/2023] [Accepted: 06/27/2023] [Indexed: 06/30/2023]
Abstract
MOTIVATION While antibodies have been ground-breaking therapeutic agents, the structural determinants for antibody binding specificity remain to be fully elucidated, which is compounded by the virtually unlimited repertoire of antigens they can recognise. Here, we have explored the structural landscapes of antibody-antigen interfaces to identify the structural determinants driving target recognition by assessing concavity and interatomic interactions. RESULTS We found that complementarity-determining regions utilised deeper concavity with their longer H3 loops, especially H3 loops of nanobody showing the deepest use of concavity. Of all amino acid residues found in complementarity-determining regions, tryptophan used deeper concavity, especially in nanobodies, making it suitable for leveraging concave antigen surfaces. Similarly, antigens utilised arginine to bind to deeper pockets of the antibody surface. Our findings fill a gap in knowledge about the antibody specificity, binding affinity, and the nature of antibody-antigen interface features, which will lead to a better understanding of how antibodies can be more effective to target druggable sites on antigen surfaces. AVAILABILITY The data and scripts are available at: https://github.com/YoochanMyung/scripts. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD Australia
| |
Collapse
|
16
|
Nguyen TB, de Sá AGC, Rodrigues CHM, Pires DEV, Ascher DB. LEGO-CSM: a tool for functional characterisation of proteins. Bioinformatics 2023:btad402. [PMID: 37382560 DOI: 10.1093/bioinformatics/btad402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 02/22/2023] [Accepted: 06/27/2023] [Indexed: 06/30/2023]
Abstract
MOTIVATION With the development of sequencing techniques, the discovery of new proteins significantly exceeds the human capacity and resources for experimentally characterising protein functions. LEGO-CSM is a comprehensive web-based resource that fills this gap by leveraging the well-established and robust graph-based signatures to supervised learning models using both protein sequence and structure information to accurately model protein function in terms of Subcellular Localisation, Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. RESULTS We show our models perform as well as or better than alternative approaches, achieving Area Under the Receiver Operating Characteristic Curve (ROC AUC) of up to 0.93 for subcellular localisation, up to 0.93 for EC and up to 0.81 for GO terms on independent blind tests. AVAILABILITY LEGO-CSM's web server is freely available at https://biosig.lab.uq.edu.au/lego_csm. In addition, all datasets used to train and test LEGO-CSM's models can be downloaded at https://biosig.lab.uq.edu.au/lego_csm/data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thanh Binh Nguyen
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Carlos H M Rodrigues
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria 3052, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria 3010, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria 3052, Australia
| |
Collapse
|
17
|
Jessen-Howard D, Pan Q, Ascher DB. Identifying the Molecular Drivers of Pathogenic Aldehyde Dehydrogenase Missense Mutations in Cancer and Non-Cancer Diseases. Int J Mol Sci 2023; 24:10157. [PMID: 37373306 DOI: 10.3390/ijms241210157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Human aldehyde dehydrogenases (ALDHs) comprising 19 isoenzymes play a vital role on both endogenous and exogenous aldehyde metabolism. This NAD(P)-dependent catalytic process relies on the intact structural and functional activity of the cofactor binding, substrate interaction, and the oligomerization of ALDHs. Disruptions on the activity of ALDHs, however, could result in the accumulation of cytotoxic aldehydes, which have been linked with a wide range of diseases, including both cancers as well as neurological and developmental disorders. In our previous works, we have successfully characterised the structure-function relationships of the missense variants of other proteins. We, therefore, applied a similar analysis pipeline to identify potential molecular drivers of pathogenic ALDH missense mutations. Variants data were first carefully curated and labelled as cancer-risk, non-cancer diseases, and benign. We then leveraged various computational biophysical methods to describe the changes caused by missense mutations, informing a bias of detrimental mutations with destabilising effects. Cooperating with these insights, several machine learning approaches were further utilised to investigate the combination of features, revealing the necessity of the conservation of ALDHs. Our work aims to provide important biological perspectives on pathogenic consequences of missense mutations of ALDHs, which could be invaluable resources in the development of cancer treatment.
Collapse
Affiliation(s)
- Dana Jessen-Howard
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| |
Collapse
|
18
|
Zhou Y, Pan Q, Pires DEV, Rodrigues CHM, Ascher DB. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res 2023:7191416. [PMID: 37283042 PMCID: PMC10320186 DOI: 10.1093/nar/gkad472] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/11/2023] [Accepted: 05/18/2023] [Indexed: 06/08/2023] Open
Abstract
Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutants, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at https://biosig.lab.uq.edu.au/ddmut.
Collapse
Affiliation(s)
- Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Carlos H M Rodrigues
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| |
Collapse
|
19
|
da Silva BM, Ascher DB, Pires DEV. epitope1D: accurate taxonomy-aware B-cell linear epitope prediction. Brief Bioinform 2023; 24:7111720. [PMID: 37039696 DOI: 10.1093/bib/bbad114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 01/30/2023] [Accepted: 03/07/2023] [Indexed: 04/12/2023] Open
Abstract
The ability to identify B-cell epitopes is an essential step in vaccine design, immunodiagnostic tests and antibody production. Several computational approaches have been proposed to identify, from an antigen protein or peptide sequence, which residues are more likely to be part of an epitope, but have limited performance on relatively homogeneous data sets and lack interpretability, limiting biological insights that could otherwise be obtained. To address these limitations, we have developed epitope1D, an explainable machine learning method capable of accurately identifying linear B-cell epitopes, leveraging two new descriptors: a graph-based signature representation of protein sequences, based on our well-established Cutoff Scanning Matrix algorithm and Organism Ontology information. Our model achieved Areas Under the ROC curve of up to 0.935 on cross-validation and blind tests, demonstrating robust performance. A comprehensive comparison to alternative methods using distinct benchmark data sets was also employed, with our model outperforming state-of-the-art tools. epitope1D represents not only a significant advance in predictive performance, but also allows biologically meaningful features to be combined and used for model interpretation. epitope1D has been made available as a user-friendly web server interface and application programming interface at https://biosig.lab.uq.edu.au/epitope1d/.
Collapse
Affiliation(s)
- Bruna Moreira da Silva
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- The School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
20
|
Silk M, de Sá A, Olshansky M, Ascher DB. Insights from Spatial Measures of Intolerance to Identifying Pathogenic Variants in Developmental and Epileptic Encephalopathies. Int J Mol Sci 2023; 24:ijms24065114. [PMID: 36982187 PMCID: PMC10049344 DOI: 10.3390/ijms24065114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/17/2023] [Accepted: 02/28/2023] [Indexed: 03/11/2023] Open
Abstract
Developmental and epileptic encephalopathies (DEEs) are a group of epilepsies with early onset and severe symptoms that sometimes lead to death. Although previous work successfully discovered several genes implicated in disease outcomes, it remains challenging to identify causative mutations within these genes from the background variation present in all individuals due to disease heterogeneity. Nevertheless, our ability to detect possible pathogenic variants has continued to improve as in silico predictors of deleteriousness have advanced. We investigate their use in prioritising likely pathogenic variants in epileptic encephalopathy patients’ whole exome sequences. We showed that the inclusion of structure-based predictors of intolerance improved upon previous attempts to demonstrate enrichment within epilepsy genes.
Collapse
Affiliation(s)
- Michael Silk
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, VIC 3010, Australia
| | - Alex de Sá
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, VIC 3010, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, QLD 4072, Australia
| | - Moshe Olshansky
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, VIC 3010, Australia
| | - David B. Ascher
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, VIC 3010, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, QLD 4072, Australia
- Correspondence: ; Tel.: +61-7-336-53891
| |
Collapse
|
21
|
Aljarf R, Tang S, Pires DEV, Ascher DB. embryoTox: Using Graph-Based Signatures to Predict the Teratogenicity of Small Molecules. J Chem Inf Model 2023; 63:432-441. [PMID: 36595441 DOI: 10.1021/acs.jcim.2c00824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Teratogenic drugs can lead to extreme fetal malformation and consequently critically influence the fetus's health, yet the teratogenic risks associated with most approved drugs are unknown. Here, we propose a novel predictive tool, embryoTox, which utilizes a graph-based signature representation of the chemical structure of a small molecule to predict and classify molecules likely to be safe during pregnancy. embryoTox was trained and validated using in vitro bioactivity data of over 700 small molecules with characterized teratogenicity effects. Our final model achieved an area under the receiver operating characteristic curve (AUC) of up to 0.96 on 10-fold cross-validation and 0.82 on nonredundant blind tests, outperforming alternative approaches. We believe that our predictive tool will provide a practical resource for optimizing screening libraries to determine effective and safe molecules to use during pregnancy. To provide a simple and integrated platform to rapidly screen for potential safe molecules and their risk factors, we made embryoTox freely available online at https://biosig.lab.uq.edu.au/embryotox/.
Collapse
Affiliation(s)
- Raghad Aljarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Simon Tang
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia
| |
Collapse
|
22
|
Ascher DB, Kaminskas LM, Myung Y, Pires DEV. Using Graph-Based Signatures to Guide Rational Antibody Engineering. Methods Mol Biol 2023; 2552:375-397. [PMID: 36346604 DOI: 10.1007/978-1-0716-2609-2_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Antibodies are essential experimental and diagnostic tools and as biotherapeutics have significantly advanced our ability to treat a range of diseases. With recent innovations in computational tools to guide protein engineering, we can now rationally design better antibodies with improved efficacy, stability, and pharmacokinetics. Here, we describe the use of the mCSM web-based in silico suite, which uses graph-based signatures to rapidly identify the structural and functional consequences of mutations, to guide rational antibody engineering to improve stability, affinity, and specificity.
Collapse
Affiliation(s)
- David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Department of Biochemistry, Cambridge University, Cambridge, UK
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| | - Lisa M Kaminskas
- School of Biological Sciences, University of Queensland, St Lucia, QLD, Australia
| | - Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- School of Computing and Information Systems, University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
23
|
Boer JC, Pan Q, Holien JK, Nguyen TB, Ascher DB, Plebanski M. A bias of Asparagine to Lysine mutations in SARS-CoV-2 outside the receptor binding domain affects protein flexibility. Front Immunol 2022; 13:954435. [PMID: 36569921 PMCID: PMC9788125 DOI: 10.3389/fimmu.2022.954435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 11/14/2022] [Indexed: 12/14/2022] Open
Abstract
Introduction COVID-19 pandemic has been threatening public health and economic development worldwide for over two years. Compared with the original SARS-CoV-2 strain reported in 2019, the Omicron variant (B.1.1.529.1) is more transmissible. This variant has 34 mutations in its Spike protein, 15 of which are present in the Receptor Binding Domain (RBD), facilitating viral internalization via binding to the angiotensin-converting enzyme 2 (ACE2) receptor on endothelial cells as well as promoting increased immune evasion capacity. Methods Herein we compared SARS-CoV-2 proteins (including ORF3a, ORF7, ORF8, Nucleoprotein (N), membrane protein (M) and Spike (S) proteins) from multiple ancestral strains. We included the currently designated original Variant of Concern (VOC) Omicron, its subsequent emerged variants BA.1, BA2, BA3, BA.4, BA.5, the two currently emerging variants BQ.1 and BBX.1, and compared these with the previously circulating VOCs Alpha, Beta, Gamma, and Delta, to better understand the nature and potential impact of Omicron specific mutations. Results Only in Omicron and its subvariants, a bias toward an Asparagine to Lysine (N to K) mutation was evident within the Spike protein, including regions outside the RBD domain, while none of the regions outside the Spike protein domain were characterized by this mutational bias. Computational structural analysis revealed that three of these specific mutations located in the central core region, contribute to a preference for the alteration of conformations of the Spike protein. Several mutations in the RBD which have circulated across most Omicron subvariants were also analysed, and these showed more potential for immune escape. Conclusion This study emphasizes the importance of understanding how specific N to K mutations outside of the RBD region affect SARS-CoV-2 conformational changes and the need for neutralizing antibodies for Omicron to target a subset of conformationally dependent B cell epitopes.
Collapse
Affiliation(s)
- Jennifer C. Boer
- School of Health and Biomedical Science, Royal Melbourne Institute of Technology, Melbourne, VIC, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Jessica K. Holien
- School of Science, Royal Melbourne Institute of Technology (RMIT) University, Melbourne, VIC, Australia
| | - Thanh-Binh Nguyen
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Magdalena Plebanski
- School of Health and Biomedical Science, Royal Melbourne Institute of Technology, Melbourne, VIC, Australia,*Correspondence: Magdalena Plebanski,
| |
Collapse
|
24
|
Williams NP, Rodrigues CHM, Truong J, Ascher DB, Holien JK. DockNet: high-throughput protein-protein interface contact prediction. Bioinformatics 2022; 39:6885444. [PMID: 36484688 PMCID: PMC9825772 DOI: 10.1093/bioinformatics/btac797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/27/2022] [Accepted: 12/08/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Over 300 000 protein-protein interaction (PPI) pairs have been identified in the human proteome and targeting these is fast becoming the next frontier in drug design. Predicting PPI sites, however, is a challenging task that traditionally requires computationally expensive and time-consuming docking simulations. A major weakness of modern protein docking algorithms is the inability to account for protein flexibility, which ultimately leads to relatively poor results. RESULTS Here, we propose DockNet, an efficient Siamese graph-based neural network method which predicts contact residues between two interacting proteins. Unlike other methods that only utilize a protein's surface or treat the protein structure as a rigid body, DockNet incorporates the entire protein structure and places no limits on protein flexibility during an interaction. Predictions are modeled at the residue level, based on a diverse set of input node features including residue type, surface accessibility, residue depth, secondary structure, pharmacophore and torsional angles. DockNet is comparable to current state-of-the-art methods, achieving an area under the curve (AUC) value of up to 0.84 on an independent test set (DB5), can be applied to a variety of different protein structures and can be utilized in situations where accurate unbound protein structures cannot be obtained. AVAILABILITY AND IMPLEMENTATION DockNet is available at https://github.com/npwilliams09/docknet and an easy-to-use webserver at https://biosig.lab.uq.edu.au/docknet. All other data underlying this article are available in the article and in its online supplementary material. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jia Truong
- STEM College, RMIT University, Melbourne, VIC, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | | |
Collapse
|
25
|
Parthasarathy S, Ruggiero SM, Gelot A, Soardi FC, Ribeiro BFR, Pires DEV, Ascher DB, Schmitt A, Rambaud C, Represa A, Xie HM, Lusk L, Wilmarth O, McDonnell PP, Juarez OA, Grace AN, Buratti J, Mignot C, Gras D, Nava C, Pierce SR, Keren B, Kennedy BC, Pena SDJ, Helbig I, Cuddapah VA. A recurrent de novo splice site variant involving DNM1 exon 10a causes developmental and epileptic encephalopathy through a dominant-negative mechanism. Am J Hum Genet 2022; 109:2253-2269. [PMID: 36413998 PMCID: PMC9748255 DOI: 10.1016/j.ajhg.2022.11.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/01/2022] [Indexed: 11/23/2022] Open
Abstract
Heterozygous pathogenic variants in DNM1 cause developmental and epileptic encephalopathy (DEE) as a result of a dominant-negative mechanism impeding vesicular fission. Thus far, pathogenic variants in DNM1 have been studied with a canonical transcript that includes the alternatively spliced exon 10b. However, after performing RNA sequencing in 39 pediatric brain samples, we find the primary transcript expressed in the brain includes the downstream exon 10a instead. Using this information, we evaluated genotype-phenotype correlations of variants affecting exon 10a and identified a cohort of eleven previously unreported individuals. Eight individuals harbor a recurrent de novo splice site variant, c.1197-8G>A (GenBank: NM_001288739.1), which affects exon 10a and leads to DEE consistent with the classical DNM1 phenotype. We find this splice site variant leads to disease through an unexpected dominant-negative mechanism. Functional testing reveals an in-frame upstream splice acceptor causing insertion of two amino acids predicted to impair oligomerization-dependent activity. This is supported by neuropathological samples showing accumulation of enlarged synaptic vesicles adherent to the plasma membrane consistent with impaired vesicular fission. Two additional individuals with missense variants affecting exon 10a, p.Arg399Trp and p.Gly401Asp, had a similar DEE phenotype. In contrast, one individual with a missense variant affecting exon 10b, p.Pro405Leu, which is less expressed in the brain, had a correspondingly less severe presentation. Thus, we implicate variants affecting exon 10a as causing the severe DEE typically associated with DNM1-related disorders. We highlight the importance of considering relevant isoforms for disease-causing variants as well as the possibility of splice site variants acting through a dominant-negative mechanism.
Collapse
Affiliation(s)
- Shridhar Parthasarathy
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA
| | - Sarah McKeown Ruggiero
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA
| | - Antoinette Gelot
- AP-HP, Hôpital Armand-Trousseau, Service d'Anatomie Pathologique, 75012 Paris, France; INMED INSERM U 901 Parc Scientifique de Luminy, 13273 Marseille, France; Centre de Recherche Clinique ConCer-LD, Paris, France
| | - Fernanda C Soardi
- GENE - Núcleo de Genética Médica, Belo Horizonte, MG, Brazil; Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil; Laboratório de Genômica Clínica, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | | | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC 3052, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3053, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC 3052, Australia; School of Chemistry and Molecular Biology, University of Queensland, St Lucia, QLD 4072, Australia
| | - Alain Schmitt
- INSERM U 1016, Institut Cochin, Paris, France; CNRS UMR 8104, Paris, France; Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Caroline Rambaud
- AP-HP, Hôpital Raymond-Poincaré, Laboratoire Anatomie Pathologique, Garches, France
| | - Alfonso Represa
- INMED, INSERM, Aix-Marseille Université, Campus de Luminy, 13009 Marseille, France
| | - Hongbo M Xie
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA
| | - Laina Lusk
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA
| | - Olivia Wilmarth
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Pamela Pojomovsky McDonnell
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Olivia A Juarez
- Baylor College of Medicine Genetics Clinic, Children's Hospital of San Antonio, San Antonio, TX, USA
| | - Alexandra N Grace
- Baylor College of Medicine Genetics Clinic, Children's Hospital of San Antonio, San Antonio, TX, USA
| | - Julien Buratti
- AP-HP, Hôpital de la Pitié Salpêtrière, Département de Génétique, 75013 Paris, France
| | - Cyril Mignot
- AP-HP, Hôpital de la Pitié Salpêtrière, Département de Génétique, 75013 Paris, France; Sorbonne Universités, UPMC Univ Paris 06, UMR S 1127, INSERM U 1127, CNRS UMR 7225, ICM, 75013 Paris, France; AP-HP, Hôpital Robert Debré, Service de Neurologie Pediatrique et de Maladies Métaboliques, 75019 Paris, France
| | - Domitille Gras
- AP-HP, Hôpital Robert Debré, Service de Neurologie Pediatrique et de Maladies Métaboliques, 75019 Paris, France
| | - Caroline Nava
- AP-HP, Hôpital de la Pitié Salpêtrière, Département de Génétique, 75013 Paris, France; Sorbonne Universités, UPMC Univ Paris 06, UMR S 1127, INSERM U 1127, CNRS UMR 7225, ICM, 75013 Paris, France; AP-HP, Hôpital Robert Debré, Service de Neurologie Pediatrique et de Maladies Métaboliques, 75019 Paris, France
| | - Samuel R Pierce
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Physical Therapy, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Boris Keren
- AP-HP, Hôpital de la Pitié Salpêtrière, Département de Génétique, 75013 Paris, France; Sorbonne Universités, UPMC Univ Paris 06, UMR S 1127, INSERM U 1127, CNRS UMR 7225, ICM, 75013 Paris, France; AP-HP, Hôpital Robert Debré, Service de Neurologie Pediatrique et de Maladies Métaboliques, 75019 Paris, France
| | - Benjamin C Kennedy
- Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA; Department of Neurosurgery, The University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sergio D J Pena
- GENE - Núcleo de Genética Médica, Belo Horizonte, MG, Brazil; Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil; Laboratório de Genômica Clínica, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA; Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Vishnu Anand Cuddapah
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| |
Collapse
|
26
|
Zhou Y, Al‐Jarf R, Alavi A, Nguyen TB, Rodrigues CHM, Pires DEV, Ascher DB. kinCSM: Using graph-based signatures to predict small molecule CDK2 inhibitors. Protein Sci 2022; 31:e4453. [PMID: 36305769 PMCID: PMC9597374 DOI: 10.1002/pro.4453] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 09/14/2022] [Accepted: 09/15/2022] [Indexed: 11/20/2022]
Abstract
Protein phosphorylation acts as an essential on/off switch in many cellular signaling pathways. This has led to ongoing interest in targeting kinases for therapeutic intervention. Computer‐aided drug discovery has been proven a useful and cost‐effective approach for facilitating prioritization and enrichment of screening libraries, but limited effort has been devoted providing insights on what makes a potent kinase inhibitor. To fill this gap, here we developed kinCSM, an integrative computational tool capable of accurately identifying potent cyclin‐dependent kinase 2 (CDK2) inhibitors, quantitatively predicting CDK2 ligand–kinase inhibition constants (pKi) and classifying different types of inhibitors based on their favorable binding modes. kinCSM predictive models were built using supervised learning and leveraged the concept of graph‐based signatures to capture both physicochemical properties and geometry properties of small molecules. CDK2 inhibitors were accurately identified with Matthew's Correlation Coefficients (MCC) of up to 0.74, and inhibition constants predicted with Pearson's correlation of up to 0.76, both with consistent performances of 0.66 and 0.68 on a nonredundant blind test, respectively. kinCSM was also able to identify the potential type of inhibition for a given molecule, achieving MCC of up to 0.80 on cross‐validation and 0.73 on the blind test. Analyzing the molecular composition of revealed enriched chemical fragments in CDK2 inhibitors and different types of inhibitors, which provides insights into the molecular mechanisms behind ligand–kinase interactions. kinCSM will be an invaluable tool to guide future kinase drug discovery. To aid the fast and accurate screening of CDK2 inhibitors, kinCSM is freely available at https://biosig.lab.uq.edu.au/kin_csm/.
Collapse
Affiliation(s)
- Yunzhuo Zhou
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia,Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia,Systems and Computational Biology, Bio21 InstituteUniversity of MelbourneMelbourneVictoriaAustralia,Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Raghad Al‐Jarf
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia,Systems and Computational Biology, Bio21 InstituteUniversity of MelbourneMelbourneVictoriaAustralia,Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Azadeh Alavi
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia,Systems and Computational Biology, Bio21 InstituteUniversity of MelbourneMelbourneVictoriaAustralia,Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Thanh Binh Nguyen
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia,Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia,Systems and Computational Biology, Bio21 InstituteUniversity of MelbourneMelbourneVictoriaAustralia,Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Carlos H. M. Rodrigues
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia,Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia,Systems and Computational Biology, Bio21 InstituteUniversity of MelbourneMelbourneVictoriaAustralia,Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Douglas E. V. Pires
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia,Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia,Systems and Computational Biology, Bio21 InstituteUniversity of MelbourneMelbourneVictoriaAustralia,Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia,School of Computing and Information SystemsUniversity of MelbourneMelbourneVictoriaAustralia
| | - David B. Ascher
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia,Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia,Systems and Computational Biology, Bio21 InstituteUniversity of MelbourneMelbourneVictoriaAustralia,Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| |
Collapse
|
27
|
Akdel M, Pires DEV, Pardo EP, Jänes J, Zalevsky AO, Mészáros B, Bryant P, Good LL, Laskowski RA, Pozzati G, Shenoy A, Zhu W, Kundrotas P, Serra VR, Rodrigues CHM, Dunham AS, Burke D, Borkakoti N, Velankar S, Frost A, Basquin J, Lindorff-Larsen K, Bateman A, Kajava AV, Valencia A, Ovchinnikov S, Durairaj J, Ascher DB, Thornton JM, Davey NE, Stein A, Elofsson A, Croll TI, Beltrao P. A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 2022; 29:1056-1067. [PMID: 36344848 PMCID: PMC9663297 DOI: 10.1038/s41594-022-00849-w] [Citation(s) in RCA: 176] [Impact Index Per Article: 88.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 09/20/2022] [Indexed: 11/09/2022]
Abstract
Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.
Collapse
Affiliation(s)
- Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Eduard Porta Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Jürgen Jänes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Arthur O Zalevsky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Patrick Bryant
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Lydia L Good
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Gabriele Pozzati
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Aditi Shenoy
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Wensi Zhu
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Petras Kundrotas
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | | | - Carlos H M Rodrigues
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Alistair S Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - David Burke
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Adam Frost
- Department of Biochemistry and Biophysics University of California, San Francisco, CA, USA
| | - Jérôme Basquin
- Department of Structural Cell Biology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Andrey V Kajava
- Université de Montpellier, Centre de Recherche en Biologie Cellulaire de Montpellier (CRBM) CNRS, Montpellier, France
| | | | - Sergey Ovchinnikov
- Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA, USA.
| | | | - David B Ascher
- School of Chemistry and Molecular Biology, University of Queensland, Brisbane, Queensland, Australia.
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| | | | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Arne Elofsson
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden.
| | - Tristan I Croll
- Cambridge Institute for Medical Research, Department of Haematology, The University of Cambridge, Cambridge, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
28
|
Iftkhar S, de Sá AGC, Velloso JPL, Aljarf R, Pires DEV, Ascher DB. cardioToxCSM: A Web Server for Predicting Cardiotoxicity of Small Molecules. J Chem Inf Model 2022; 62:4827-4836. [PMID: 36219164 DOI: 10.1021/acs.jcim.2c00822] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The design of novel, safe, and effective drugs to treat human diseases is a challenging venture, with toxicity being one of the main sources of attrition at later stages of development. Failure due to toxicity incurs a significant increase in costs and time to market, with multiple drugs being withdrawn from the market due to their adverse effects. Cardiotoxicity, for instance, was responsible for the failure of drugs such as fenspiride, propoxyphene, and valdecoxib. While significant effort has been dedicated to mitigate this issue by developing computational approaches that aim to identify molecules likely to be toxic, including quantitative structure-activity relationship models and machine learning methods, current approaches present limited performance and interpretability. To overcome these, we propose a new web-based computational method, cardioToxCSM, which can predict six types of cardiac toxicity outcomes, including arrhythmia, cardiac failure, heart block, hERG toxicity, hypertension, and myocardial infarction, efficiently and accurately. cardioToxCSM was developed using the concept of graph-based signatures, molecular descriptors, toxicophore matchings, and molecular fingerprints, leveraging explainable machine learning, and was validated internally via different cross validation schemes and externally via low-redundancy blind sets. The models presented robust performances with areas under ROC curves of up to 0.898 on 5-fold cross-validation, consistent with metrics on blind tests. Additionally, our models provide interpretation of the predictions by identifying whether substructures that are commonly enriched in toxic compounds were present. We believe cardioToxCSM will provide valuable insight into the potential cardiotoxicity of small molecules early on drug screening efforts. The method is made freely available as a web server at https://biosig.lab.uq.edu.au/cardiotoxcsm.
Collapse
Affiliation(s)
- Saba Iftkhar
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - João P L Velloso
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Raghad Aljarf
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| |
Collapse
|
29
|
Rodrigues CHM, Garg A, Keizer D, Pires DEV, Ascher DB. CSM-peptides: A computational approach to rapid identification of therapeutic peptides. Protein Sci 2022; 31:e4442. [PMID: 36173168 PMCID: PMC9518225 DOI: 10.1002/pro.4442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022]
Abstract
Peptides are attractive alternatives for the development of new therapeutic strategies due to their versatility and low complexity of synthesis. Increasing interest in these molecules has led to the creation of large collections of experimentally characterized therapeutic peptides, which greatly contributes to development of data‐driven computational approaches. Here we propose CSM‐peptides, a novel machine learning method for rapid identification of eight different types of therapeutic peptides: anti‐angiogenic, anti‐bacterial, anti‐cancer, anti‐inflammatory, anti‐viral, cell‐penetrating, quorum sensing, and surface binding. Our method has shown to outperform existing approaches, achieving an AUC of up to 0.92 on independent blind tests, and consistent performance on cross‐validation. We anticipate CSM‐peptides to be of great value in helping screening large libraries to identify novel peptides with therapeutic potential and have made it freely available as a user‐friendly web server and Application Programming Interface at https://biosig.lab.uq.edu.au/csm_peptides.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| | - Anjali Garg
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - David Keizer
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| |
Collapse
|
30
|
Tichkule S, Myung Y, Naung MT, Ansell BRE, Guy AJ, Srivastava N, Mehra S, Cacciò SM, Mueller I, Barry AE, van Oosterhout C, Pope B, Ascher DB, Jex AR. VIVID: a web application for variant interpretation and visualisation in multidimensional analyses. Mol Biol Evol 2022; 39:6697981. [PMID: 36103257 PMCID: PMC9514033 DOI: 10.1093/molbev/msac196] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Large-scale comparative genomics- and population genetic studies generate enormous amounts of polymorphism data in the form of DNA variants. Ultimately, the goal of many of these studies is to associate genetic variants to phenotypes or fitness. We introduce VIVID, an interactive, user-friendly web application that integrates a wide range of approaches for encoding genotypic to phenotypic information in any organism or disease, from an individual or population, in three-dimensional (3D) space. It allows mutation mapping and annotation, calculation of interactions and conservation scores, prediction of harmful effects, analysis of diversity and selection, and 3D visualization of genotypic information encoded in Variant Call Format on AlphaFold2 protein models. VIVID enables the rapid assessment of genes of interest in the study of adaptive evolution and the genetic load, and it helps prioritizing targets for experimental validation. We demonstrate the utility of VIVID by exploring the evolutionary genetics of the parasitic protist Plasmodium falciparum, revealing geographic variation in the signature of balancing selection in potential targets of functional antibodies.
Collapse
Affiliation(s)
- Swapnil Tichkule
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
- Department of Medical Biology, University of Melbourne , Melbourne , Australia
| | - Yoochan Myung
- Systems and Computational Biology, Bio21 Institute, University of Melbourne , Melbourne , Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes , Melbourne , Australia
| | - Myo T Naung
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
- Department of Medical Biology, University of Melbourne , Melbourne , Australia
| | - Brendan R E Ansell
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
| | - Andrew J Guy
- School of Science, RMIT University , Melbourne , Australia
| | - Namrata Srivastava
- Department of Data Science and AI, Monash University , Melbourne , Australia
| | - Somya Mehra
- Life Sciences Discipline, Burnet Institute , Melbourne , Australia
| | - Simone M Cacciò
- Department of Infectious Disease, Istituto Superiore di Sanità , Rome , Italy
| | - Ivo Mueller
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
| | - Alyssa E Barry
- Life Sciences Discipline, Burnet Institute , Melbourne , Australia
- Institute of Mental and Physical Health and Clinical Translation (IMPACT) and School of Medicine, Deakin University , Geelong , Australia
| | - Cock van Oosterhout
- School of Environmental Sciences, University of East Anglia, Norwich Research Park , Norwich , UK
| | - Bernard Pope
- Melbourne Bioinformatics, University of Melbourne , Melbourne , Australia
- Australian BioCommons , Sydney , Australia
- Department of Clinical Pathology, University of Melbourne , Melbourne , Australia
- Department of Surgery (Royal Melbourne Hospital), University of Melbourne , Melbourne , Australia
| | - David B Ascher
- Systems and Computational Biology, Bio21 Institute, University of Melbourne , Melbourne , Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes , Melbourne , Australia
| | - Aaron R Jex
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne , Melbourne , Australia
| |
Collapse
|
31
|
Ruff KM, Choi YH, Cox D, Ormsby AR, Myung Y, Ascher DB, Radford SE, Pappu RV, Hatters DM. Sequence grammar underlying the unfolding and phase separation of globular proteins. Mol Cell 2022; 82:3193-3208.e8. [PMID: 35853451 PMCID: PMC10846692 DOI: 10.1016/j.molcel.2022.06.024] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 05/05/2022] [Accepted: 06/15/2022] [Indexed: 12/23/2022]
Abstract
Aberrant phase separation of globular proteins is associated with many diseases. Here, we use a model protein system to understand how the unfolded states of globular proteins drive phase separation and the formation of unfolded protein deposits (UPODs). We find that for UPODs to form, the concentrations of unfolded molecules must be above a threshold value. Additionally, unfolded molecules must possess appropriate sequence grammars to drive phase separation. While UPODs recruit molecular chaperones, their compositional profiles are also influenced by synergistic physicochemical interactions governed by the sequence grammars of unfolded proteins and cellular proteins. Overall, the driving forces for phase separation and the compositional profiles of UPODs are governed by the sequence grammars of unfolded proteins. Our studies highlight the need for uncovering the sequence grammars of unfolded proteins that drive UPOD formation and cause gain-of-function interactions whereby proteins are aberrantly recruited into UPODs.
Collapse
Affiliation(s)
- Kiersten M Ruff
- Department of Biomedical Engineering, Center for Science & Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Yoon Hee Choi
- Department of Biochemistry and Pharmacology and Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Dezerae Cox
- Department of Biochemistry and Pharmacology and Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Angelique R Ormsby
- Department of Biochemistry and Pharmacology and Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Yoochan Myung
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia; Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, The University of Melbourne, Melbourne, VIC 3010, Australia; Systems and Computational Biology, Bio21 Institute, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia; Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, The University of Melbourne, Melbourne, VIC 3010, Australia; Systems and Computational Biology, Bio21 Institute, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Sheena E Radford
- Astbury Centre for Structural and Molecular Biology, School of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK
| | - Rohit V Pappu
- Department of Biomedical Engineering, Center for Science & Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO 63130, USA.
| | - Danny M Hatters
- Department of Biochemistry and Pharmacology and Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Melbourne, VIC 3010, Australia.
| |
Collapse
|
32
|
de Sá AGC, Long Y, Portelli S, Pires DEV, Ascher DB. toxCSM: comprehensive prediction of small molecule toxicity profiles. Brief Bioinform 2022; 23:6673851. [PMID: 35998885 DOI: 10.1093/bib/bbac337] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/17/2022] [Accepted: 07/23/2022] [Indexed: 01/29/2023] Open
Abstract
Drug discovery is a lengthy, costly and high-risk endeavour that is further convoluted by high attrition rates in later development stages. Toxicity has been one of the main causes of failure during clinical trials, increasing drug development time and costs. To facilitate early identification and optimisation of toxicity profiles, several computational tools emerged aiming at improving success rates by timely pre-screening drug candidates. Despite these efforts, there is an increasing demand for platforms capable of assessing both environmental as well as human-based toxicity properties at large scale. Here, we present toxCSM, a comprehensive computational platform for the study and optimisation of toxicity profiles of small molecules. toxCSM leverages on the well-established concepts of graph-based signatures, molecular descriptors and similarity scores to develop 36 models for predicting a range of toxicity properties, which can assist in developing safer drugs and agrochemicals. toxCSM achieved an Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of up to 0.99 and Pearson's correlation coefficients of up to 0.94 on 10-fold cross-validation, with comparable performance on blind test sets, outperforming all alternative methods. toxCSM is freely available as a user-friendly web server and API at http://biosig.lab.uq.edu.au/toxcsm.
Collapse
Affiliation(s)
- Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Yangyang Long
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, 3052, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, 3052, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria, 3010, Australia
| |
Collapse
|
33
|
Rodrigues CHM, Pires DEV, Blundell TL, Ascher DB. Structural landscapes of PPI interfaces. Brief Bioinform 2022; 23:bbac165. [PMID: 35656714 PMCID: PMC9294409 DOI: 10.1093/bib/bbac165] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/10/2022] [Accepted: 04/13/2022] [Indexed: 02/07/2023] Open
Abstract
Proteins are capable of highly specific interactions and are responsible for a wide range of functions, making them attractive in the pursuit of new therapeutic options. Previous studies focusing on overall geometry of protein-protein interfaces, however, concluded that PPI interfaces were generally flat. More recently, this idea has been challenged by their structural and thermodynamic characterisation, suggesting the existence of concave binding sites that are closer in character to traditional small-molecule binding sites, rather than exhibiting complete flatness. Here, we present a large-scale analysis of binding geometry and physicochemical properties of all protein-protein interfaces available in the Protein Data Bank. In this review, we provide a comprehensive overview of the protein-protein interface landscape, including evidence that even for overall larger, more flat interfaces that utilize discontinuous interacting regions, small and potentially druggable pockets are utilized at binding sites.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria
- School of Chemistry and Molecular Biosciences, Bio21 Institute, University of Queensland, Brisbane, Victoria
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria
- School of Chemistry and Molecular Biosciences, Bio21 Institute, University of Queensland, Brisbane, Victoria
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
34
|
Aljarf R, Shen M, Pires DEV, Ascher DB. Understanding and predicting the functional consequences of missense mutations in BRCA1 and BRCA2. Sci Rep 2022; 12:10458. [PMID: 35729312 PMCID: PMC9213547 DOI: 10.1038/s41598-022-13508-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 05/25/2022] [Indexed: 11/21/2022] Open
Abstract
BRCA1 and BRCA2 are tumour suppressor genes that play a critical role in maintaining genomic stability via the DNA repair mechanism. DNA repair defects caused by BRCA1 and BRCA2 missense variants increase the risk of developing breast and ovarian cancers. Accurate identification of these variants becomes clinically relevant, as means to guide personalized patient management and early detection. Next-generation sequencing efforts have significantly increased data availability but also the discovery of variants of uncertain significance that need interpretation. Experimental approaches used to measure the molecular consequences of these variants, however, are usually costly and time-consuming. Therefore, computational tools have emerged as faster alternatives for assisting in the interpretation of the clinical significance of newly discovered variants. To better understand and predict variant pathogenicity in BRCA1 and BRCA2, various machine learning algorithms have been proposed, however presented limited performance. Here we present BRCA1 and BRCA2 gene-specific models and a generic model for quantifying the functional impacts of single-point missense variants in these genes. Across tenfold cross-validation, our final models achieved a Matthew's Correlation Coefficient (MCC) of up to 0.98 and comparable performance of up to 0.89 across independent, non-redundant blind tests, outperforming alternative approaches. We believe our predictive tool will be a valuable resource for providing insights into understanding and interpreting the functional consequences of missense variants in these genes and as a tool for guiding the interpretation of newly discovered variants and prioritizing mutations for experimental validation.
Collapse
Affiliation(s)
- Raghad Aljarf
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia.,Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, 3010, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC, 3052, Australia
| | - Mengyuan Shen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia.,Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, 3010, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC, 3052, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, 3053, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia. .,Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, 3010, Australia. .,Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC, 3052, Australia. .,School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, 3053, Australia.
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia. .,Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, 3010, Australia. .,Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC, 3052, Australia. .,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge, CB2 1GA, UK.
| |
Collapse
|
35
|
Rezende PM, Xavier JS, Ascher DB, Fernandes GR, Pires DEV. Evaluating hierarchical machine learning approaches to classify biological databases. Brief Bioinform 2022; 23:6611916. [PMID: 35724625 PMCID: PMC9310517 DOI: 10.1093/bib/bbac216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/29/2022] [Accepted: 05/09/2022] [Indexed: 12/04/2022] Open
Abstract
The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include ‘Local’ approaches considering the hierarchy, building models per level or node, and ‘Global’ hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of ‘Local per Level’ and ‘Local per Node’ approaches with a ‘Global’ approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
Collapse
Affiliation(s)
- Pâmela M Rezende
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Stilingue Inteligência Artificial
| | - Joicymara S Xavier
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland.,Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | | | - Douglas E V Pires
- Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
36
|
Rodrigues CHM, Ascher DB. CSM-Potential: mapping protein interactions and biological ligands in 3D space using geometric deep learning. Nucleic Acids Res 2022; 50:W204-W209. [PMID: 35609999 PMCID: PMC9252741 DOI: 10.1093/nar/gkac381] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/19/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022] Open
Abstract
Recent advances in protein structural modelling have enabled the accurate prediction of the holo 3D structures of almost any protein, however protein function is intrinsically linked to the interactions it makes. While a number of computational approaches have been proposed to explore potential biological interactions, they have been limited to specific interactions, and have not been readily accessible for non-experts or use in bioinformatics pipelines. Here we present CSM-Potential, a geometric deep learning approach to identify regions of a protein surface that are likely to mediate protein-protein and protein-ligand interactions in order to provide a link between 3D structure and biological function. Our method has shown robust performance, outperforming existing methods for both predictive tasks. By assessing the performance of CSM-Potential on independent blind tests, we show that our method was able to achieve ROC AUC values of up to 0.81 for the identification of potential protein-protein binding sites, and up to 0.96 accuracy on biological ligand classification. Our method is freely available as a user-friendly and easy-to-use web server and API at http://biosig.unimelb.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
37
|
Paiva VA, Mendonça MV, Silveira SA, Ascher DB, Pires DEV, Izidoro SC. GASS-Metal: identifying metal-binding sites on protein structures using genetic algorithms. Brief Bioinform 2022; 23:6590153. [PMID: 35595534 DOI: 10.1093/bib/bbac178] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/18/2022] [Accepted: 04/20/2022] [Indexed: 12/12/2022] Open
Abstract
Metals are present in >30% of proteins found in nature and assist them to perform important biological functions, including storage, transport, signal transduction and enzymatic activity. Traditional and experimental techniques for metal-binding site prediction are usually costly and time-consuming, making computational tools that can assist in these predictions of significant importance. Here we present Genetic Active Site Search (GASS)-Metal, a new method for protein metal-binding site prediction. The method relies on a parallel genetic algorithm to find candidate metal-binding sites that are structurally similar to curated templates from M-CSA and MetalPDB. GASS-Metal was thoroughly validated using homologous proteins and conservative mutations of residues, showing a robust performance. The ability of GASS-Metal to identify metal-binding sites was also compared with state-of-the-art methods, outperforming similar methods and achieving an MCC of up to 0.57 and detecting up to 96.1% of the sites correctly. GASS-Metal is freely available at https://gassmetal.unifei.edu.br. The GASS-Metal source code is available at https://github.com/sandroizidoro/gassmetal-local.
Collapse
Affiliation(s)
- Vinícius A Paiva
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, Brazil
| | - Murillo V Mendonça
- Institute of Technological Sciences, Campus Theodomiro Carneiro Santiago, Universidade Federal de Itajubá, Itabira, Brazil
| | - Sabrina A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, Brazil
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Sandro C Izidoro
- Institute of Technological Sciences, Campus Theodomiro Carneiro Santiago, Universidade Federal de Itajubá, Itabira, Brazil
| |
Collapse
|
38
|
Stephenson SE, Costain G, Blok LE, Silk MA, Nguyen TB, Dong X, Alhuzaimi DE, Dowling JJ, Walker S, Amburgey K, Hayeems RZ, Rodan LH, Schwartz MA, Picker J, Lynch SA, Gupta A, Rasmussen KJ, Schimmenti LA, Klee EW, Niu Z, Agre KE, Chilton I, Chung WK, Revah-Politi A, Au PB, Griffith C, Racobaldo M, Raas-Rothschild A, Ben Zeev B, Barel O, Moutton S, Morice-Picard F, Carmignac V, Cornaton J, Marle N, Devinsky O, Stimach C, Wechsler SB, Hainline BE, Sapp K, Willems M, Bruel AL, Dias KR, Evans CA, Roscioli T, Sachdev R, Temple SE, Zhu Y, Baker JJ, Scheffer IE, Gardiner FJ, Schneider AL, Muir AM, Mefford HC, Crunk A, Heise EM, Millan F, Monaghan KG, Person R, Rhodes L, Richards S, Wentzensen IM, Cogné B, Isidor B, Nizon M, Vincent M, Besnard T, Piton A, Marcelis C, Kato K, Koyama N, Ogi T, Goh ESY, Richmond C, Amor DJ, Boyce JO, Morgan AT, Hildebrand MS, Kaspi A, Bahlo M, Friðriksdóttir R, Katrínardóttir H, Sulem P, Stefánsson K, Björnsson HT, Mandelstam S, Morleo M, Mariani M, Scala M, Accogli A, Torella A, Capra V, Wallis M, Jansen S, Waisfisz Q, de Haan H, Sadedin S, Lim SC, White SM, Ascher DB, Schenck A, Lockhart PJ, Christodoulou J, Tan TY, Christodoulou J, Tan TY. Germline variants in tumor suppressor FBXW7 lead to impaired ubiquitination and a neurodevelopmental syndrome. Am J Hum Genet 2022; 109:601-617. [PMID: 35395208 DOI: 10.1016/j.ajhg.2022.03.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 02/28/2022] [Indexed: 11/01/2022] Open
Abstract
Neurodevelopmental disorders are highly heterogenous conditions resulting from abnormalities of brain architecture and/or function. FBXW7 (F-box and WD-repeat-domain-containing 7), a recognized developmental regulator and tumor suppressor, has been shown to regulate cell-cycle progression and cell growth and survival by targeting substrates including CYCLIN E1/2 and NOTCH for degradation via the ubiquitin proteasome system. We used a genotype-first approach and global data-sharing platforms to identify 35 individuals harboring de novo and inherited FBXW7 germline monoallelic chromosomal deletions and nonsense, frameshift, splice-site, and missense variants associated with a neurodevelopmental syndrome. The FBXW7 neurodevelopmental syndrome is distinguished by global developmental delay, borderline to severe intellectual disability, hypotonia, and gastrointestinal issues. Brain imaging detailed variable underlying structural abnormalities affecting the cerebellum, corpus collosum, and white matter. A crystal-structure model of FBXW7 predicted that missense variants were clustered at the substrate-binding surface of the WD40 domain and that these might reduce FBXW7 substrate binding affinity. Expression of recombinant FBXW7 missense variants in cultured cells demonstrated impaired CYCLIN E1 and CYCLIN E2 turnover. Pan-neuronal knockdown of the Drosophila ortholog, archipelago, impaired learning and neuronal function. Collectively, the data presented herein provide compelling evidence of an F-Box protein-related, phenotypically variable neurodevelopmental disorder associated with monoallelic variants in FBXW7.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - John Christodoulou
- Murdoch Children's Research Institute, Melbourne, VIC 3052, Australia; Department of Paediatrics, University of Melbourne, Melbourne, VIC 3052, Australia; Victorian Clinical Genetics Services, Melbourne, VIC 3052, Australia
| | - Tiong Yang Tan
- Murdoch Children's Research Institute, Melbourne, VIC 3052, Australia; Department of Paediatrics, University of Melbourne, Melbourne, VIC 3052, Australia; Victorian Clinical Genetics Services, Melbourne, VIC 3052, Australia.
| |
Collapse
|
39
|
Pan Q, Nguyen TB, Ascher DB, Pires DEV. Systematic evaluation of computational tools to predict the effects of mutations on protein stability in the absence of experimental structures. Brief Bioinform 2022; 23:bbac025. [PMID: 35189634 PMCID: PMC9155634 DOI: 10.1093/bib/bbac025] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 01/13/2022] [Accepted: 01/30/2022] [Indexed: 12/26/2022] Open
Abstract
Changes in protein sequence can have dramatic effects on how proteins fold, their stability and dynamics. Over the last 20 years, pioneering methods have been developed to try to estimate the effects of missense mutations on protein stability, leveraging growing availability of protein 3D structures. These, however, have been developed and validated using experimentally derived structures and biophysical measurements. A large proportion of protein structures remain to be experimentally elucidated and, while many studies have based their conclusions on predictions made using homology models, there has been no systematic evaluation of the reliability of these tools in the absence of experimental structural data. We have, therefore, systematically investigated the performance and robustness of ten widely used structural methods when presented with homology models built using templates at a range of sequence identity levels (from 15% to 95%) and contrasted performance with sequence-based tools, as a baseline. We found there is indeed performance deterioration on homology models built using templates with sequence identity below 40%, where sequence-based tools might become preferable. This was most marked for mutations in solvent exposed residues and stabilizing mutations. As structure prediction tools improve, the reliability of these predictors is expected to follow, however we strongly suggest that these factors should be taken into consideration when interpreting results from structure-based predictors of mutation effects on protein stability.
Collapse
Affiliation(s)
- Qisheng Pan
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
| | - Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
- Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria 3053, Australia
| |
Collapse
|
40
|
Pires DEV, Stubbs KA, Mylne JS, Ascher DB. cropCSM: designing safe and potent herbicides with graph-based signatures. Brief Bioinform 2022; 23:6535680. [PMID: 35211724 PMCID: PMC9155605 DOI: 10.1093/bib/bbac042] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 12/11/2022] Open
Abstract
Herbicides have revolutionised weed management, increased crop yields and improved profitability allowing for an increase in worldwide food security. Their widespread use, however, has also led to a rise in resistance and concerns about their environmental impact. Despite the need for potent and safe herbicidal molecules, no herbicide with a new mode of action has reached the market in 30 years. Although development of computational approaches has proven invaluable to guide rational drug discovery pipelines, leading to higher hit rates and lower attrition due to poor toxicity, little has been done in contrast for herbicide design. To fill this gap, we have developed cropCSM, a computational platform to help identify new, potent, nontoxic and environmentally safe herbicides. By using a knowledge-based approach, we identified physicochemical properties and substructures enriched in safe herbicides. By representing the small molecules as a graph, we leveraged these insights to guide the development of predictive models trained and tested on the largest collected data set of molecules with experimentally characterised herbicidal profiles to date (over 4500 compounds). In addition, we developed six new environmental and human toxicity predictors, spanning five different species to assist in molecule prioritisation. cropCSM was able to correctly identify 97% of herbicides currently available commercially, while predicting toxicity profiles with accuracies of up to 92%. We believe cropCSM will be an essential tool for the enrichment of screening libraries and to guide the development of potent and safe herbicides. We have made the method freely available through a user-friendly webserver at http://biosig.unimelb.edu.au/crop_csm.
Collapse
Affiliation(s)
- Douglas E V Pires
- School of Computing and Information Systems at the University of Melbourne
| | - Keith A Stubbs
- School of Molecular Sciences at the University of Western Australia
| | - Joshua S Mylne
- Curtin University and Deputy Director of the Centre for Crop and Disease Management
| | - David B Ascher
- University of Queensland, and head of Computational Biology and Clinical Informatics at the Baker Institute and Systems
| |
Collapse
|
41
|
Abrusán G, Ascher DB, Inouye M. Known allosteric proteins have central roles in genetic disease. PLoS Comput Biol 2022; 18:e1009806. [PMID: 35139069 PMCID: PMC10138267 DOI: 10.1371/journal.pcbi.1009806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 04/27/2023] [Accepted: 01/05/2022] [Indexed: 12/15/2022] Open
Abstract
Allostery is a form of protein regulation, where ligands that bind sites located apart from the active site can modify the activity of the protein. The molecular mechanisms of allostery have been extensively studied, because allosteric sites are less conserved than active sites, and drugs targeting them are more specific than drugs binding the active sites. Here we quantify the importance of allostery in genetic disease. We show that 1) known allosteric proteins are central in disease networks, contribute to genetic disease and comorbidities much more than non-allosteric proteins, and there is an association between being allosteric and involvement in disease; 2) they are enriched in many major disease types like hematopoietic diseases, cardiovascular diseases, cancers, diabetes, or diseases of the central nervous system; 3) variants from cancer genome-wide association studies are enriched near allosteric proteins, indicating their importance to polygenic traits; and 4) the importance of allosteric proteins in disease is due, at least partly, to their central positions in protein-protein interaction networks, and less due to their dynamical properties.
Collapse
Affiliation(s)
- György Abrusán
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, School of Medicine, University of Cambridge, Cambridge, United Kingdom
- * E-mail:
| | - David B. Ascher
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
- Structural Biology and Bioinformatics, Department of Biochemistry, Bio21 Institute, University of Melbourne, Melbourne, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Australia
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, School of Medicine, University of Cambridge, Cambridge, United Kingdom
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Australia
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
- The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
42
|
Myung Y, Pires DEV, Ascher DB. CSM-AB: graph-based antibody-antigen binding affinity prediction and docking scoring function. Bioinformatics 2022; 38:1141-1143. [PMID: 34734992 DOI: 10.1093/bioinformatics/btab762] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 10/18/2021] [Accepted: 11/01/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Understanding antibody-antigen interactions is key to improving their binding affinities and specificities. While experimental approaches are fundamental for developing new therapeutics, computational methods can provide quick assessment of binding landscapes, guiding experimental design. Despite this, little effort has been devoted to accurately predicting the binding affinity between antibodies and antigens and to develop tailored docking scoring functions for this type of interaction. Here, we developed CSM-AB, a machine learning method capable of predicting antibody-antigen binding affinity by modelling interaction interfaces as graph-based signatures. RESULTS CSM-AB outperformed alternative methods achieving a Pearson's correlation of up to 0.64 on blind tests. We also show CSM-AB can accurately rank near-native poses, working effectively as a docking scoring function. We believe CSM-AB will be an invaluable tool to assist in the development of new immunotherapies. AVAILABILITY AND IMPLEMENTATION CSM-AB is freely available as a user-friendly web interface and API at http://biosig.unimelb.edu.au/csm_ab/datasets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yoochan Myung
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia.,School of Chemistry and Molecular Biosciences, University Of Queensland, St Lucia, QLD, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia.,School of Chemistry and Molecular Biosciences, University Of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
43
|
Karmakar M, Ragonnet R, Ascher DB, Trauer JM, Denholm JT. Estimating tuberculosis drug resistance amplification rates in high-burden settings. BMC Infect Dis 2022; 22:82. [PMID: 35073862 PMCID: PMC8785585 DOI: 10.1186/s12879-022-07067-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 01/11/2022] [Indexed: 11/20/2022] Open
Abstract
Background Antimicrobial resistance develops following the accrual of mutations in the bacterial genome, and may variably impact organism fitness and hence, transmission risk. Classical representation of tuberculosis (TB) dynamics using a single or two strain (DS/MDR-TB) model typically does not capture elements of this important aspect of TB epidemiology. To understand and estimate the likelihood of resistance spreading in high drug-resistant TB incidence settings, we used epidemiological data to develop a mathematical model of Mycobacterium tuberculosis (Mtb) transmission. Methods A four-strain (drug-susceptible (DS), isoniazid mono-resistant (INH-R), rifampicin mono-resistant (RIF-R) and multidrug-resistant (MDR)) compartmental deterministic Mtb transmission model was developed to explore the progression from DS- to MDR-TB in The Philippines and Viet Nam. The models were calibrated using data from national tuberculosis prevalence (NTP) surveys and drug resistance surveys (DRS). An adaptive Metropolis algorithm was used to estimate the risks of drug resistance amplification among unsuccessfully treated individuals. Results The estimated proportion of INH-R amplification among failing treatments was 0.84 (95% CI 0.79–0.89) for The Philippines and 0.77 (95% CI 0.71–0.84) for Viet Nam. The proportion of RIF-R amplification among failing treatments was 0.05 (95% CI 0.04–0.07) for The Philippines and 0.011 (95% CI 0.010–0.012) for Viet Nam. Conclusion The risk of resistance amplification due to treatment failure for INH was dramatically higher than RIF. We observed RIF-R strains were more likely to be transmitted than acquired through amplification, while both mechanisms of acquisition were important contributors in the case of INH-R. These findings highlight the complexity of drug resistance dynamics in high-incidence settings, and emphasize the importance of prioritizing testing algorithms which allow for early detection of INH-R. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07067-1.
Collapse
|
44
|
Karmakar M, Cicaloni V, Rodrigues CH, Spiga O, Santucci A, Ascher DB. HGDiscovery: An online tool providing functional and phenotypic information on novel variants of homogentisate 1,2- dioxigenase. Curr Res Struct Biol 2022; 4:271-277. [PMID: 36118553 PMCID: PMC9471331 DOI: 10.1016/j.crstbi.2022.08.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 07/28/2022] [Accepted: 08/23/2022] [Indexed: 11/28/2022] Open
Abstract
Alkaptonuria (AKU), a rare genetic disorder, is characterized by the accumulation of homogentisic acid (HGA) in the body. Affected individuals lack functional levels of an enzyme required to breakdown HGA. Mutations in the homogentisate 1,2-dioxygenase (HGD) gene cause AKU and they are responsible for deficient levels of functional HGD, which, in turn, leads to excess levels of HGA. Although HGA is rapidly cleared from the body by the kidneys, in the long term it starts accumulating in various tissues, especially cartilage. Over time (rarely before adulthood), it eventually changes the color of affected tissue to slate blue or black. Here we report a comprehensive mutation analysis of 111 pathogenic and 190 non-pathogenic HGD missense mutations using protein structural information. Using our comprehensive suite of graph-based signature methods, mCSM complemented with sequence-based tools, we studied the functional and molecular consequences of each mutation on protein stability, interaction and evolutionary conservation. The scores generated from the structure and sequence-based tools were used to train a supervised machine learning algorithm with 89% accuracy. The empirical classifier was used to generate the variant phenotype for novel HGD missense mutations. All this information is deployed as a user friendly freely available web server called HGDiscovery (https://biosig.lab.uq.edu.au/hgdiscovery/). Functional and phenotypic consequences of HGD non-synonymous variations. Biophysical, structural and evolutionary analysis of novel and known clinical variants. Pathogenic mutations affected protein stability and conformational flexibility. Pathogenic mutations associated with deleterious scores for sequence-based features. HGDiscovery (http://biosig.unimelb.edu.au/hgdiscovery/) – webserver.
Collapse
Affiliation(s)
- Malancha Karmakar
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Vittoria Cicaloni
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Carlos H.M. Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biology, University of Queensland, Brisbane, Queensland, Australia
| | - Ottavia Spiga
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - Annalisa Santucci
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, Siena, Italy
| | - David B. Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biology, University of Queensland, Brisbane, Queensland, Australia
- Corresponding author. Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| |
Collapse
|
45
|
Nguyen TB, Pires DEV, Ascher DB. CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function. Brief Bioinform 2021; 23:6457169. [PMID: 34882232 DOI: 10.1093/bib/bbab512] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/06/2021] [Accepted: 11/08/2021] [Indexed: 12/29/2022] Open
Abstract
Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson's correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at http://biosig.unimelb.edu.au/csm_carbohydrate/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.
Collapse
Affiliation(s)
- Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.,Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
46
|
Uthayopas K, de Sá AGC, Alavi A, Pires DEV, Ascher DB. TSMDA: Target and symptom-based computational model for miRNA-disease-association prediction. Mol Ther Nucleic Acids 2021; 26:536-546. [PMID: 34631283 PMCID: PMC8479276 DOI: 10.1016/j.omtn.2021.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/19/2021] [Indexed: 02/06/2023]
Abstract
The emergence of high-throughput sequencing techniques has revealed a primary role of microRNAs (miRNAs) in a wide range of diseases, including cancers and neurodegenerative disorders. Understanding novel relationships between miRNAs and diseases can potentially unveil complex pathogenesis mechanisms, leading to effective diagnosis and treatment. The investigation of novel miRNA-disease associations, however, is currently costly and time consuming. Over the years, several computational models have been proposed to prioritize potential miRNA-disease associations, but with limited usability or predictive capability. In order to fill this gap, we introduce TSMDA, a novel machine-learning method that leverages target and symptom information and negative sample selection to predict miRNA-disease association. TSMDA significantly outperforms similar methods, achieving an area under the receiver operating characteristic (ROC) curve (AUC) of 0.989 and 0.982 under 5-fold cross-validation and blind test, respectively. We also demonstrate the capability of the method to uncover potential miRNA-disease associations in breast, prostate, and lung cancers, as case studies. We believe TSMDA will be an invaluable tool for the community to explore and prioritize potentially new miRNA-disease associations for further experimental characterization. The method was made available as a freely accessible and user-friendly web interface at http://biosig.unimelb.edu.au/tsmda/.
Collapse
Affiliation(s)
- Korawich Uthayopas
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, VIC, Australia
| | - Azadeh Alavi
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, VIC, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, VIC, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, VIC, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, VIC, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, VIC, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| |
Collapse
|
47
|
Abstract
Protein-protein interactions are promising sites for development of selective drugs; however, they have generally been viewed as challenging targets. Molecules targeting protein-protein interactions tend to be larger and more lipophilic than other drug-like molecules, mimicking the properties of interacting interfaces. Here, we propose a machine learning approach that uses a graph-based representation of small molecules to guide identification of inhibitors modulating protein-protein interactions, pdCSM-PPI. This approach was applied to 21 different PPI targets. We developed interaction-specific models that were able to accurately identify active compounds achieving MCC and F1 scores up to 1, and Pearson's correlations up to 0.87, outperforming previous approaches. Using insights from these individual models, we developed a generic protein-protein interaction modulator predictive model, which accurately predicted IC50 with a Pearson's correlation of 0.64 on a low redundancy blind test. Importantly, we were able to accurately identify active from inactive compounds, achieving an AUC of 0.77 and sensitivity and specificity of 76% and 78%, respectively. We believe pdCSM-PPI will be an important tool to help guide more efficient screening of new PPI inhibitors; it is freely available as an easy-to-use web server and API at http://biosig.unimelb.edu.au/pdcsm_ppi.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
48
|
Nguyen TB, Myung Y, de Sá AGC, Pires DEV, Ascher DB. mmCSM-NA: accurately predicting effects of single and multiple mutations on protein-nucleic acid binding affinity. NAR Genom Bioinform 2021; 3:lqab109. [PMID: 34805992 PMCID: PMC8600011 DOI: 10.1093/nargab/lqab109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/20/2021] [Accepted: 10/27/2021] [Indexed: 02/02/2023] Open
Abstract
While protein-nucleic acid interactions are pivotal for many crucial biological processes, limited experimental data has made the development of computational approaches to characterise these interactions a challenge. Consequently, most approaches to understand the effects of missense mutations on protein-nucleic acid affinity have focused on single-point mutations and have presented a limited performance on independent data sets. To overcome this, we have curated the largest dataset of experimentally measured effects of mutations on nucleic acid binding affinity to date, encompassing 856 single-point mutations and 141 multiple-point mutations across 155 experimentally solved complexes. This was used in combination with an optimized version of our graph-based signatures to develop mmCSM-NA (http://biosig.unimelb.edu.au/mmcsm_na), the first scalable method capable of quantitatively and accurately predicting the effects of multiple-point mutations on nucleic acid binding affinities. mmCSM-NA obtained a Pearson's correlation of up to 0.67 (RMSE of 1.06 Kcal/mol) on single-point mutations under cross-validation, and up to 0.65 on independent non-redundant datasets of multiple-point mutations (RMSE of 1.12 kcal/mol), outperforming similar tools. mmCSM-NA is freely available as an easy-to-use web-server and API. We believe it will be an invaluable tool to shed light on the role of mutations affecting protein-nucleic acid interactions in diseases.
Collapse
Affiliation(s)
- Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Yoochan Myung
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Alex G C de Sá
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | | | - David B Ascher
- To whom correspondence should be addressed. Tel: +61 90354794;
| |
Collapse
|
49
|
Velloso JPL, Ascher DB, Pires DEV. pdCSM-GPCR: predicting potent GPCR ligands with graph-based signatures. Bioinform Adv 2021; 1:vbab031. [PMID: 34901870 PMCID: PMC8651072 DOI: 10.1093/bioadv/vbab031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 09/30/2021] [Accepted: 11/02/2021] [Indexed: 01/26/2023]
Abstract
MOTIVATION G protein-coupled receptors (GPCRs) can selectively bind to many types of ligands, ranging from light-sensitive compounds, ions, hormones, pheromones and neurotransmitters, modulating cell physiology. Considering their role in many essential cellular processes, they are one of the most targeted protein families, with over a third of all approved drugs modulating GPCR signalling. Despite this, the large diversity of receptors and their multipass transmembrane architectures make the identification and development of novel specific, and safe GPCR ligands a challenge. While computational approaches have the potential to assist GPCR drug development, they have presented limited performance and generalization capabilities. Here, we explored the use of graph-based signatures to develop pdCSM-GPCR, a method capable of rapidly and accurately screening potential GPCR ligands. RESULTS Bioactivity data (IC50, EC50, Ki and Kd) for individual GPCRs were curated. After curation, we used the data for developing predictive models for 36 major GPCR targets, across 4 classes (A, B, C and F). Our models compose the most comprehensive computational resource for GPCR bioactivity prediction to date. Across stratified 10-fold cross-validation and blind tests, our approach achieved Pearson's correlations of up to 0.89, significantly outperforming previous methods. Interpreting our results, we identified common important features of potent GPCRs ligands, which tend to have bicyclic rings, leading to higher levels of aromaticity. We believe pdCSM-GPCR will be an invaluable tool to assist screening efforts, enriching compound libraries and ranking candidates for further experimental validation. AVAILABILITY AND IMPLEMENTATION pdCSM-GPCR predictive models and datasets used have been made available via a freely accessible and easy-to-use web server at http://biosig.unimelb.edu.au/pdcsm_gpcr/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- João Paulo L Velloso
- Fundação Oswaldo Cruz, Instituto René Rachou, Belo Horizonte 30190-009, Brazil,Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia,Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Melbourne 3052, Australia,Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,To whom correspondence should be addressed. or
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia,School of Computing and Information Systems, University of Melbourne, Melbourne 3053, Australia,To whom correspondence should be addressed. or
| |
Collapse
|
50
|
Lai CY, Tsai IJ, Chiu PC, Ascher DB, Chien YH, Huang YH, Lin YL, Hwu WL, Lee NC. A novel deep intronic variant strongly associates with Alkaptonuria. NPJ Genom Med 2021; 6:89. [PMID: 34686677 PMCID: PMC8536767 DOI: 10.1038/s41525-021-00252-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 10/04/2021] [Indexed: 11/08/2022] Open
Abstract
Alkaptonuria is a rare autosomal recessive inherited disorder of tyrosine metabolism, which causes ochronosis, arthropathy, cardiac valvular calcification, and urolithiasis. The epidemiology of alkaptonuria in East Asia is not clear. In this study, patients diagnosed with alkaptonuria from January 2010 to June 2020 were reviewed. Their clinical and molecular features were further compared with those of patients from other countries. Three patients were found to have alkaptonuria. Mutation analyses of the homogentisate 1,2-dioxygenase gene (HGD) showed four novel variants c.16-2063 A > C, p.(Thr196Ile), p.(Gly344AspfsTer25), and p.(Gly362Arg) in six mutated alleles (83.3%). RNA sequencing revealed that c.16-2063 A > C activates a cryptic exon, causing protein truncation p.(Tyr5_Ile6insValTer17). A literature search identified another 6 patients with alkaptonuria in East Asia; including our cases, 13 of the 18 mutated alleles have not been reported elsewhere in the world. Alkaptonuria is rare in Taiwan and East Asia, with HGD variants being mostly novel and private.
Collapse
Affiliation(s)
- Chien-Yi Lai
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
- Department of Pediatrics, National Taiwan University Children Hospital, Taipei, Taiwan
- Department of Pediatrics, National Taiwan University Hospital Hsin-Chu Branch, Hsin-Chu, Taiwan
| | - I-Jung Tsai
- Department of Pediatrics, National Taiwan University Children Hospital, Taipei, Taiwan
| | - Pao-Chin Chiu
- Department of Pediatrics, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Department of Biochemistry, Bio21 Institute, University of Cambridge, Cambridge, UK
| | - Yin-Hsiu Chien
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
- Department of Pediatrics, National Taiwan University Children Hospital, Taipei, Taiwan
| | - Yu-Hsuan Huang
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Yi-Lin Lin
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Wuh-Liang Hwu
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
- Department of Pediatrics, National Taiwan University Children Hospital, Taipei, Taiwan
| | - Ni-Chung Lee
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan.
- Department of Pediatrics, National Taiwan University Children Hospital, Taipei, Taiwan.
| |
Collapse
|