1
|
MacGowan SA, Madeira F, Britto-Borges T, Barton GJ. A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites. Commun Biol 2024; 7:447. [PMID: 38605212 PMCID: PMC11009406 DOI: 10.1038/s42003-024-06117-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/27/2024] [Indexed: 04/13/2024] Open
Abstract
Protein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference. Here, we map 2.4 million population variants to 5885 protein families and quantify residue-level constraint with a new Missense Enrichment Score (MES). Analysis of 61,214 structures from the PDB spanning 3661 families shows that missense depleted sites are enriched in buried residues or those involved in small-molecule or protein binding. MES is complementary to evolutionary conservation and a combined analysis allows a new classification of residues according to a conservation plane. This approach finds functional residues that are evolutionarily diverse, which can be related to specificity, as well as family-wide conserved sites that are critical for folding or function. We also find a possible contrast between lethal and non-lethal pathogenic sites, and a surprising clinical variant hot spot at a subset of missense enriched positions.
Collapse
Affiliation(s)
- Stuart A MacGowan
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
| | - Fábio Madeira
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Thiago Britto-Borges
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- Section of Bioinformatics and Systems Cardiology, Department of Internal Medicine III and Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Geoffrey J Barton
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
2
|
Mohamed AS, Salama AF, Sabaa MA, Toraih E, Elshazli RM. GEMIN4 Variants: Risk Profiling, Bioinformatics, and Dynamic Simulations Uncover Susceptibility to Bladder Carcinoma. Arch Med Res 2024; 55:102970. [PMID: 38401326 DOI: 10.1016/j.arcmed.2024.102970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 01/11/2024] [Accepted: 02/13/2024] [Indexed: 02/26/2024]
Abstract
BACKGROUND The relationship between GEMIN4 genetic variants and cancer, especially bladder carcinoma (BLCA), has been explored without conclusive results. This study aims to elucidate the link between GEMIN4 polymorphisms and BLCA susceptibility through genetic analyses, bioinformatics, and molecular dynamics (MD) simulations. METHODS A cohort of 249 participants (121 BLCA patients and 128 unrelated controls) was enrolled. PCR was employed for allelic discrimination of GEMIN4 variants, followed by subgroup stratification, haplotype analyses, structural prediction using the AlphaFold2 prediction tool, subsequent MD simulations, structural analysis, and residue interaction mapping using Desmond, UCSF ChimeraX, and Cytoscape softwares. RESULTS The rs.2740348*G variant demonstrated a protective role against BLCA in allelic (OR = 0.55, p = 0.002) and recessive (OR = 0.54, p = 0.017) models, whereas the rs.7813*T variant increased BLCA risk under the recessive model (OR = 1.90, p = 0.019). Haplotype analysis revealed a significant association between GEMIN4 haplotype (rs.2740348*C/rs.7813*T) with increased BLCA risk (OR = 2.01, p = 0.004). Univariate analysis revealed associations of the variants with albumin levels and absolute neutrophil count in BLCA patients. Pathogenicity evaluation categorized p.Gln450Glu as neutral and p.Arg1033Cys as deleterious. MD simulations revealed structural alterations and conformational shifts in the GEMIN4 protein induced by the Glu450 and Cys1033 mutations. CONCLUSIONS The study highlights the dual role of GEMIN4 variants in BLCA susceptibility, with rs.2740348 conferring protection and rs.7813 increasing risk. The Glu450 residue positively impacted protein stability, while Cys1033 had a detrimental effect on protein function. These findings underscore the significance of GEMIN4 variants in BLCA susceptibility and pave the way for future diagnostic and therapeutic initiatives.
Collapse
Affiliation(s)
- Abdallah S Mohamed
- Biochemistry Division, Department of Chemistry, Faculty of Science, Tanta University, Tanta, Egypt
| | - Afrah F Salama
- Biochemistry Division, Department of Chemistry, Faculty of Science, Tanta University, Tanta, Egypt
| | - Magdy A Sabaa
- Department of Urology, Faculty of Medicine, Tanta University, Tanta, Egypt
| | - Eman Toraih
- Endocrine and Oncology Division, Department of Surgery, Tulane University School of Medicine, New Orleans, LA, USA; Genetics Unit, Department of Histology and Cell Biology, Faculty of Medicine, Suez Canal University, Ismailia, Egypt.
| | - Rami M Elshazli
- Biochemistry and Molecular Genetics Unit, Department of Basic Sciences, Faculty of Physical Therapy, Horus University - Egypt, New Damietta, Egypt.
| |
Collapse
|
3
|
Helbawi E, Abd El-Latif SA, Toson MA, Banach A, Mohany M, Al-Rejaie SS, Elwan H. Impacts of Biosynthesized Manganese Dioxide Nanoparticles on Antioxidant Capacity, Hematological Parameters, and Antioxidant Protein Docking in Broilers. ACS OMEGA 2024; 9:9396-9409. [PMID: 38434868 PMCID: PMC10905714 DOI: 10.1021/acsomega.3c08775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 01/26/2024] [Accepted: 01/31/2024] [Indexed: 03/05/2024]
Abstract
Using green tomato extract, a green approach was used to synthesize manganese oxide nanoparticles (MnO2NPs). The synthesis of MnO2NPs was (20.93-36.85 nm) confirmed by energy-dispersive X-ray (EDX), scanning and transmission electron microscopy (SEM and TEM), Fourier transform infrared spectroscopy (FTIR), and UV-visible spectroscopy (UV-vis) analyses. One hundred fifty-day-old Arbor Acres broiler chicks were randomly divided into five groups. The control group received a diet containing 60 mg Mn/kg (100% NRC broiler recommendation). The other four groups received different levels of Mn from both bulk MnO2 and green synthesized MnO2NPs, ranging from 66 to 72 mg/kg (110% and 120% of the standard level). Each group comprised 30 birds, in three replicates of 10 birds each. Generally, the study's results indicate that incorporating MnO2NPs as a feed additive had no negative effects on broiler chick growth, antioxidant status, and overall physiological responses. The addition of MnO2NPs, whether at 66 or 72 mg/kg, led to enhanced superoxide dismutase (SOD) activity in both serum and liver tissues of the broiler chicks. Notably, the 72 mg MnO2NPs group displayed significantly higher SOD activity compared to the other groups. The study was further justified through docking. High throughput targeted docking was performed for proteins GHS, GST, and SOD with MnO2. SOD showed an effective binding affinity of -2.3 kcal/mol. This research sheds light on the potential of MnO2NPs as a safe and effective feed additive for broiler chicks. Further studies are required to explore the underlying mechanisms and long-term effects of incorporating MnO2NPs into broiler feed, to optimize broiler production and promote its welfare.
Collapse
Affiliation(s)
- Esraa
S. Helbawi
- Animal
and Poultry Production Department, Faculty of Agriculture, Minia University, 61519 EL-Minya, Egypt
| | - S. A. Abd El-Latif
- Animal
and Poultry Production Department, Faculty of Agriculture, Minia University, 61519 EL-Minya, Egypt
| | - Mahmoud A. Toson
- Animal
and Poultry Production Department, Faculty of Agriculture, Minia University, 61519 EL-Minya, Egypt
| | - Artur Banach
- Department
of Biology and Biotechnology of Microorganisms, Institute of Biological
Sciences, Faculty of Medicine, The John
Paul II Catholic University of Lublin, 20-708 Lublin, Poland
| | - Mohamed Mohany
- Department
of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh 11451, Saudi Arabia
| | - Salim S. Al-Rejaie
- Department
of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh 11451, Saudi Arabia
| | - Hamada Elwan
- Animal
and Poultry Production Department, Faculty of Agriculture, Minia University, 61519 EL-Minya, Egypt
| |
Collapse
|
4
|
Ge F, Arif M, Yan Z, Alahmadi H, Worachartcheewan A, Shoombuatong W. Review of Computational Methods and Database Sources for Predicting the Effects of Coding Frameshift Small Insertion and Deletion Variations. ACS OMEGA 2024; 9:2032-2047. [PMID: 38250421 PMCID: PMC10795160 DOI: 10.1021/acsomega.3c07662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/30/2023] [Accepted: 12/04/2023] [Indexed: 01/23/2024]
Abstract
Genetic variations (including substitutions, insertions, and deletions) exert a profound influence on DNA sequences. These variations are systematically classified as synonymous, nonsynonymous, and nonsense, each manifesting distinct effects on proteins. The implementation of high-throughput sequencing has significantly augmented our comprehension of the intricate interplay between gene variations and protein structure and function, as well as their ramifications in the context of diseases. Frameshift variations, particularly small insertions and deletions (indels), disrupt protein coding and are instrumental in disease pathogenesis. This review presents a succinct review of computational methods, databases, current challenges, and future directions in predicting the consequences of coding frameshift small indels variations. We analyzed the predictive efficacy, reliability, and utilization of computational methods and variant account, reliability, and utilization of database. Besides, we also compared the prediction methodologies on GOF/LOF pathogenic variation data. Addressing the challenges pertaining to prediction accuracy and cross-species generalizability, nascent technologies such as AI and deep learning harbor immense potential to enhance predictive capabilities. The importance of interdisciplinary research and collaboration cannot be overstated for devising effective diagnosis, treatment, and prevention strategies concerning diseases associated with coding frameshift indels variations.
Collapse
Affiliation(s)
- Fang Ge
- State
Key Laboratory of Organic Electronics and lnformation Displays &
lnstitute of Advanced Materials (IAM), Nanjing University of Posts
& Telecommunications, 9 Wenyuan Road, Nanjing 210023, China
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| | - Muhammad Arif
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
| | - Zihao Yan
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Hanin Alahmadi
- College
of Computer Science and Engineering, Taibah
University, Madinah 344, Saudi Arabia
| | - Apilak Worachartcheewan
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
5
|
Alamri SH, Haque S, Alghamdi BS, Tayeb HO, Azhari S, Farsi RM, Elmokadem A, Alamri TA, Harakeh S, Prakash A, Kumar V. Comprehensive mapping of mutations in TDP-43 and α-Synuclein that affect stability and binding. J Biomol Struct Dyn 2023:1-13. [PMID: 38126188 DOI: 10.1080/07391102.2023.2293258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 11/11/2023] [Indexed: 12/23/2023]
Abstract
Abnormal aggregation and amyloid inclusions of TAR DNA-binding protein 43 (TDP-43) and α-Synuclein (α-Syn) are frequently co-observed in amyotrophic lateral sclerosis, Parkinson's disease, and Alzheimer's disease. Several reports showed TDP-43 C-terminal domain (CTD) and α-Syn interact with each other and the aggregates of these two proteins colocalized together in different cellular and animal models. Molecular dynamics simulation was conducted to elucidate the stability of the TDP-43 and Syn complex structure. The interfacial mutations in protein complexes changes the stability and binding affinity of the protein that may cause diseases. Here, we have utilized the computational saturation mutagenesis approach including structure-based stability and binding energy calculations to compute the systemic effects of missense mutations of TDP-43 CTD and α-Syn on protein stability and binding affinity. Most of the interfacial mutations of CTD and α-Syn were found to destabilize the protein and reduced the protein binding affinity. The results thus shed light on the functional consequences of missense mutations observed in TDP-43 associated proteinopathies and may provide the mechanisms of co-morbidities involving these two proteins.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sultan H Alamri
- Department of Family Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Shafiul Haque
- Research and Scientific Studies Unit, College of Nursing and Allied Health Sciences, Jazan University, Jazan, Saudi Arabia
- Gilbert and Rose-Marie Chagoury School of Medicine, Lebanese American University, Beirut, Lebanon
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates
| | - Badra S Alghamdi
- Department of Physiology, Neuroscience Unit, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
- Pre-Clinical Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Haythum O Tayeb
- The Mind and Brain Studies Initiative, Neuroscience Research Unit, Department of Neurology, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Shereen Azhari
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Reem M Farsi
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abear Elmokadem
- Department of Hematology/Pediatric Oncology, King Abdulaziz University Hospital, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Turki A Alamri
- Family and Community Medicine Department, Faculty of Medicine in Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Steve Harakeh
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Jeddah, Saudi Arabia
- Yousef Abdul Latif Jameel Scientific Chair of Prophetic Medicine Application, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Amresh Prakash
- Amity Institute of Integrative Sciences and Health (AIISH), Amity University Haryana, Gurgaon, India
| | - Vijay Kumar
- Amity Institute of Neuropsychology & Neurosciences, Amity University, Noida, India
| |
Collapse
|
6
|
Rojas Velazquez MN, Therkelsen S, Pandey AV. Exploring Novel Variants of the Cytochrome P450 Reductase Gene ( POR) from the Genome Aggregation Database by Integrating Bioinformatic Tools and Functional Assays. Biomolecules 2023; 13:1728. [PMID: 38136599 PMCID: PMC10741880 DOI: 10.3390/biom13121728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/22/2023] [Accepted: 11/27/2023] [Indexed: 12/24/2023] Open
Abstract
Cytochrome P450 oxidoreductase (POR) is an essential redox partner for steroid and drug-metabolizing cytochromes P450 located in the endoplasmic reticulum. Mutations in POR lead to metabolic disorders, including congenital adrenal hyperplasia, and affect the metabolism of steroids, drugs, and xenobiotics. In this study, we examined approximately 450 missense variants of the POR gene listed in the Genome Aggregation Database (gnomAD) using eleven different in silico prediction tools. We found that 64 novel variants were consistently predicted to be disease-causing by most tools. To validate our findings, we conducted a population analysis and selected two variations in POR for further investigation. The human POR wild type and the R268W and L577P variants were expressed in bacteria and subjected to enzyme kinetic assays using a model substrate. We also examined the activities of several cytochrome P450 proteins in the presence of POR (WT or variants) by combining P450 and reductase proteins in liposomes. We observed a decrease in enzymatic activities (ranging from 35% to 85%) of key drug-metabolizing enzymes, supported by POR variants R288W and L577P compared to WT-POR. These results validate our approach of curating a vast amount of data from genome projects and provide an updated and reliable reference for diagnosing POR deficiency.
Collapse
Affiliation(s)
- Maria Natalia Rojas Velazquez
- Division of Pediatric Endocrinology, Department of Pediatrics, University Children’s Hospital Bern, 3010 Bern, Switzerland; (M.N.R.V.); (S.T.)
- Translational Hormone Research, Department of Biomedical Research, University of Bern, 3010 Bern, Switzerland
- Graduate School for Cellular and Biomedical Sciences, University of Bern, 3010 Bern, Switzerland
| | - Søren Therkelsen
- Division of Pediatric Endocrinology, Department of Pediatrics, University Children’s Hospital Bern, 3010 Bern, Switzerland; (M.N.R.V.); (S.T.)
- Translational Hormone Research, Department of Biomedical Research, University of Bern, 3010 Bern, Switzerland
- Department of Drug Design and Pharmacology, University of Copenhagen, 1172 Copenhagen, Denmark
| | - Amit V. Pandey
- Division of Pediatric Endocrinology, Department of Pediatrics, University Children’s Hospital Bern, 3010 Bern, Switzerland; (M.N.R.V.); (S.T.)
- Translational Hormone Research, Department of Biomedical Research, University of Bern, 3010 Bern, Switzerland
| |
Collapse
|
7
|
Larrea-Sebal A, Jebari-Benslaiman S, Galicia-Garcia U, Jose-Urteaga AS, Uribe KB, Benito-Vicente A, Martín C. Predictive Modeling and Structure Analysis of Genetic Variants in Familial Hypercholesterolemia: Implications for Diagnosis and Protein Interaction Studies. Curr Atheroscler Rep 2023; 25:839-859. [PMID: 37847331 PMCID: PMC10618353 DOI: 10.1007/s11883-023-01154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 10/18/2023]
Abstract
PURPOSE OF REVIEW Familial hypercholesterolemia (FH) is a hereditary condition characterized by elevated levels of low-density lipoprotein cholesterol (LDL-C), which increases the risk of cardiovascular disease if left untreated. This review aims to discuss the role of bioinformatics tools in evaluating the pathogenicity of missense variants associated with FH. Specifically, it highlights the use of predictive models based on protein sequence, structure, evolutionary conservation, and other relevant features in identifying genetic variants within LDLR, APOB, and PCSK9 genes that contribute to FH. RECENT FINDINGS In recent years, various bioinformatics tools have emerged as valuable resources for analyzing missense variants in FH-related genes. Tools such as REVEL, Varity, and CADD use diverse computational approaches to predict the impact of genetic variants on protein function. These tools consider factors such as sequence conservation, structural alterations, and receptor binding to aid in interpreting the pathogenicity of identified missense variants. While these predictive models offer valuable insights, the accuracy of predictions can vary, especially for proteins with unique characteristics that might not be well represented in the databases used for training. This review emphasizes the significance of utilizing bioinformatics tools for assessing the pathogenicity of FH-associated missense variants. Despite their contributions, a definitive diagnosis of a genetic variant necessitates functional validation through in vitro characterization or cascade screening. This step ensures the precise identification of FH-related variants, leading to more accurate diagnoses. Integrating genetic data with reliable bioinformatics predictions and functional validation can enhance our understanding of the genetic basis of FH, enabling improved diagnosis, risk stratification, and personalized treatment for affected individuals. The comprehensive approach outlined in this review promises to advance the management of this inherited disorder, potentially leading to better health outcomes for those affected by FH.
Collapse
Affiliation(s)
- Asier Larrea-Sebal
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
- Fundación Biofisika Bizkaia, 48940, Leioa, Spain
| | - Shifa Jebari-Benslaiman
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Unai Galicia-Garcia
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Ane San Jose-Urteaga
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Kepa B Uribe
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Asier Benito-Vicente
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - César Martín
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain.
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain.
| |
Collapse
|
8
|
Varadi M, Tsenkov M, Velankar S. Challenges in bridging the gap between protein structure prediction and functional interpretation. Proteins 2023. [PMID: 37850517 DOI: 10.1002/prot.26614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/26/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The rapid evolution of protein structure prediction tools has significantly broadened access to protein structural data. Although predicted structure models have the potential to accelerate and impact fundamental and translational research significantly, it is essential to note that they are not validated and cannot be considered the ground truth. Thus, challenges persist, particularly in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations are crucial to overcoming these obstacles. Databases like the AlphaFold Protein Structure Database, the ESM Metagenomic Atlas, and initiatives like the 3D-Beacons Network provide FAIR access to these data, enabling their interpretation and application across a broader scientific community. Whilst substantial advancements have been made in protein structure prediction, further progress is required to address the remaining challenges. Developing training materials, nurturing collaborations, and ensuring open data sharing will be paramount in this pursuit. The continued evolution of these tools and methodologies will deepen our understanding of protein function and accelerate disease pathogenesis and drug development discoveries.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Maxim Tsenkov
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
9
|
Hall MWJ, Shorthouse D, Alcraft R, Jones PH, Hall BA. Mutations observed in somatic evolution reveal underlying gene mechanisms. Commun Biol 2023; 6:753. [PMID: 37468606 PMCID: PMC10356810 DOI: 10.1038/s42003-023-05136-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 07/11/2023] [Indexed: 07/21/2023] Open
Abstract
Highly sensitive DNA sequencing techniques have allowed the discovery of large numbers of somatic mutations in normal tissues. Some mutations confer a competitive advantage over wild-type cells, generating expanding clones that spread through the tissue. Competition between mutant clones leads to selection. This process can be considered a large scale, in vivo screen for mutations increasing cell fitness. It follows that somatic missense mutations may offer new insights into the relationship between protein structure, function and cell fitness. We present a flexible statistical method for exploring the selection of structural features in data sets of somatic mutants. We show how this approach can evidence selection of specific structural features in key drivers in aged tissues. Finally, we show how drivers may be classified as fitness-enhancing and fitness-suppressing through different patterns of mutation enrichment. This method offers a route to understanding the mechanism of protein function through in vivo mutant selection.
Collapse
Affiliation(s)
| | - David Shorthouse
- Department of Medical Physics and Biomedical Engineering, Malet Place Engineering Building, University College London, Gower Street, London, WC1E 6BT, UK
| | - Rachel Alcraft
- Advanced Research Computing, University College London, London, UK
| | - Philip H Jones
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Oncology, University of Cambridge, Cambridge, CB2 0XZ, UK
| | - Benjamin A Hall
- Department of Medical Physics and Biomedical Engineering, Malet Place Engineering Building, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
10
|
Mohseni Behbahani Y, Laine E, Carbone A. Deep Local Analysis deconstructs protein-protein interfaces and accurately estimates binding affinity changes upon mutation. Bioinformatics 2023; 39:i544-i552. [PMID: 37387162 DOI: 10.1093/bioinformatics/btad231] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The spectacular recent advances in protein and protein complex structure prediction hold promise for reconstructing interactomes at large-scale and residue resolution. Beyond determining the 3D arrangement of interacting partners, modeling approaches should be able to unravel the impact of sequence variations on the strength of the association. RESULTS In this work, we report on Deep Local Analysis, a novel and efficient deep learning framework that relies on a strikingly simple deconstruction of protein interfaces into small locally oriented residue-centered cubes and on 3D convolutions recognizing patterns within cubes. Merely based on the two cubes associated with the wild-type and the mutant residues, DLA accurately estimates the binding affinity change for the associated complexes. It achieves a Pearson correlation coefficient of 0.735 on about 400 mutations on unseen complexes. Its generalization capability on blind datasets of complexes is higher than the state-of-the-art methods. We show that taking into account the evolutionary constraints on residues contributes to predictions. We also discuss the influence of conformational variability on performance. Beyond the predictive power on the effects of mutations, DLA is a general framework for transferring the knowledge gained from the available non-redundant set of complex protein structures to various tasks. For instance, given a single partially masked cube, it recovers the identity and physicochemical class of the central residue. Given an ensemble of cubes representing an interface, it predicts the function of the complex. AVAILABILITY AND IMPLEMENTATION Source code and models are available at http://gitlab.lcqb.upmc.fr/DLA/DLA.git.
Collapse
Affiliation(s)
- Yasser Mohseni Behbahani
- Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Sorbonne Université, CNRS, IBPS, Paris 75005, France
| | - Elodie Laine
- Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Sorbonne Université, CNRS, IBPS, Paris 75005, France
| | - Alessandra Carbone
- Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Sorbonne Université, CNRS, IBPS, Paris 75005, France
| |
Collapse
|
11
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
12
|
Pandey P, Ghimire S, Wu B, Alexov E. On the linkage of thermodynamics and pathogenicity. Curr Opin Struct Biol 2023; 80:102572. [PMID: 36965249 DOI: 10.1016/j.sbi.2023.102572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/16/2023] [Accepted: 02/21/2023] [Indexed: 03/27/2023]
Abstract
This review outlines the effect of disease-causing mutations on proteins' thermodynamics. Two major thermodynamics quantities, which are essential for structural integrity, the folding and binding free energy changes caused by missense mutations, are considered. It is emphasized that disease effects in case of complex diseases may originate from several mutations over several genes, while monogenic diseases are caused by mutation is a single gene. Nevertheless, in both cases it is shown that pathogenic mutations cause larger perturbations of the above-mentioned thermodynamics quantities as compared with the benign mutations. Recent works demonstrating the effect of pathogenic mutations on the above-mentioned thermodynamics quantities, as well as on structural dynamics and allosteric pathways, are reviewed.
Collapse
Affiliation(s)
- Preeti Pandey
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Sanjeev Ghimire
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Bohua Wu
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA.
| |
Collapse
|
13
|
Alkilani S, Sevimoglu T. In silico analysis of substitution mutations in the β-globin gene in Turkish population of β-thalassemia. J Biomol Struct Dyn 2023; 41:14028-14035. [PMID: 36752381 DOI: 10.1080/07391102.2023.2176924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 01/30/2023] [Indexed: 02/09/2023]
Abstract
Beta-thalassemia is a genetic blood disorder represented by anomalies in hemoglobin's beta chain production. Most hemoglobin defects are a result of mutations of the structural β-globin gene. Many diseases, including β-thalassemia, benefit from computational studies that aid researchers in investigating the association of genotype and phenotype. In this study, the alanine substitution mutations of the β-globin protein sub-units in the Turkish population (Hb Ankara, Hb Siirt and Hb Izmir) and the effects of those mutations on the β-globin protein structure and performance are examined using molecular dynamics simulation. While Hb Ankara variant showed a non-conservative mutation, Hb Siirt and Hb Izmir showed a semi-conservative mutation. RMSF values of Hb Siirt, between residues 95 and 99, were higher than wild-type and the other mutant proteins. The residues of Hb Ankara showed lower fluctuation compared to the other structures. The mean ROG values were 1.47 nm, 1.46 nm, 1.49 nm and 1.48 and the average number of the hydrogen bonds were 92, 100, 99, and 89 for Hb Ankara, Hb Siirt and Hb Izmir, respectively. Moreover, a significant increase in overall motion in Hb Siirt was observed based on PCA analysis. Hb Siirt substitution mutation might cause an effect in β-globin proteins which could impact the protein function. This indicates a major role on beta globin subunit's stability for alanine on 27th position. However, Hb Ankara and Hb Izmir variants may act as a silent mutation, since these two mutations did not show a large change in the dynamics of the protein.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sima Alkilani
- Department of Bioengineering, Uskudar University, Uskudar, Istanbul, Türkiye
| | - Tuba Sevimoglu
- Department of Bioengineering, University of Health Sciences, Uskudar, Istanbul, Türkiye
| |
Collapse
|
14
|
Koşaca M, Yılmazbilek İ, Karaca E. PROT-ON: A structure-based detection of designer PROTein interface MutatiONs. Front Mol Biosci 2023; 10:1063971. [PMID: 36936988 PMCID: PMC10018488 DOI: 10.3389/fmolb.2023.1063971] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/31/2023] [Indexed: 03/06/2023] Open
Abstract
The mutation-induced changes across protein-protein interfaces have often been observed to lead to severe diseases. Therefore, several computational tools have been developed to predict the impact of such mutations. Among these tools, FoldX and EvoEF1 stand out as fast and accurate alternatives. Expanding on the capabilities of these tools, we have developed the PROT-ON (PROTein-protein interface mutatiONs) framework, which aims at delivering the most critical protein interface mutations that can be used to design new protein binders. To realize this aim, PROT-ON takes the 3D coordinates of a protein dimer as an input. Then, it probes all possible interface mutations on the selected protein partner with EvoEF1 or FoldX. The calculated mutational energy landscape is statistically analyzed to find the most enriching and depleting mutations. Afterward, these extreme mutations are filtered out according to stability and optionally according to evolutionary criteria. The final remaining mutation list is presented to the user as the designer mutation set. Together with this set, PROT-ON provides several residue- and energy-based plots, portraying the synthetic energy landscape of the probed mutations. The stand-alone version of PROT-ON is deposited at https://github.com/CSB-KaracaLab/prot-on. The users can also use PROT-ON through our user-friendly web service http://proton.tools.ibg.edu.tr:8001/ (runs with EvoEF1 only). Considering its speed and the range of analysis provided, we believe that PROT-ON presents a promising means to estimate designer mutations.
Collapse
Affiliation(s)
- Mehdi Koşaca
- Izmir Biomedicine and Genome Center, Dokuz Eylul Health Campus, Izmir, Türkiye
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Türkiye
| | - İrem Yılmazbilek
- Izmir Biomedicine and Genome Center, Dokuz Eylul Health Campus, Izmir, Türkiye
- Middle East Technical University, Ankara, Türkiye
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Dokuz Eylul Health Campus, Izmir, Türkiye
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Türkiye
- *Correspondence: Ezgi Karaca,
| |
Collapse
|
15
|
Unraveling the Structural Changes in the DNA-Binding Region of Tumor Protein p53 ( TP53) upon Hotspot Mutation p53 Arg248 by Comparative Computational Approach. Int J Mol Sci 2022; 23:ijms232415499. [PMID: 36555140 PMCID: PMC9779389 DOI: 10.3390/ijms232415499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/09/2022] [Accepted: 11/16/2022] [Indexed: 12/13/2022] Open
Abstract
The vital tissue homeostasis regulator p53 forms a tetramer when it binds to DNA and regulates the genes that mediate essential biological processes such as cell-cycle arrest, senescence, DNA repair, and apoptosis. Missense mutations in the core DNA-binding domain (109-292) simultaneously cause the loss of p53 tumor suppressor function and accumulation of the mutant p53 proteins that are carcinogenic. The most common p53 hotspot mutation at codon 248 in the DNA-binding region, where arginine (R) is substituted by tryptophan (W), glycine (G), leucine (L), proline (P), and glutamine (Q), is reported in various cancers. However, it is unclear how the p53 Arg248 mutation with distinct amino acid substitution affects the structure, function, and DNA binding affinity. Here, we characterized the pathogenicity and protein stability of p53 hotspot mutations at codon 248 using computational tools PredictSNP, Align GVGD, HOPE, ConSurf, and iStable. We found R248W, R248G, and R248P mutations highly deleterious and destabilizing. Further, we subjected all five R248 mutant-p53-DNA and wt-p53-DNA complexes to molecular dynamics simulation to investigate the structural stability and DNA binding affinity. From the MD simulation analysis, we observed increased RMSD, RMSF, and Rg values and decreased protein-DNA intermolecular hydrogen bonds in the R248-p53-DNA than the wt-p53-DNA complexes. Likewise, due to high SASA values, we observed the shrinkage of proteins in R248W, R248G, and R248P mutant-p53-DNA complexes. Compared to other mutant p53-DNA complexes, the R248W, R248G, and R248P mutant-p53-DNA complexes showed more structural alteration. MM-PBSA analysis showed decreased binding energies with DNA in all five R248-p53-DNA mutants than the wt-p53-DNA complexes. Henceforth, we conclude that the amino acid substitution of Arginine with the other five amino acids at codon 248 reduces the p53 protein's affinity for DNA and may disrupt cell division, resulting in a gain of p53 function. The proposed study influences the development of rationally designed molecular-targeted treatments that improve p53-based therapeutic outcomes in cancer.
Collapse
|
16
|
Xue B, Li R, Ma H, Rahaman A, Kumar V. Comprehensive mapping of mutations in the C9ORF72 that affect folding and binding to SMCR8 protein. Process Biochem 2022. [DOI: 10.1016/j.procbio.2022.07.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
17
|
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet 2022; 13:981005. [PMID: 36246661 PMCID: PMC9559863 DOI: 10.3389/fgene.2022.981005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
One objective of human genetics is to unveil the variants that contribute to human diseases. With the rapid development and wide use of next-generation sequencing (NGS), massive genomic sequence data have been created, making personal genetic information available. Conventional experimental evidence is critical in establishing the relationship between sequence variants and phenotype but with low efficiency. Due to the lack of comprehensive databases and resources which present clinical and experimental evidence on genotype-phenotype relationship, as well as accumulating variants found from NGS, different computational tools that can predict the impact of the variants on phenotype have been greatly developed to bridge the gap. In this review, we present a brief introduction and discussion about the computational approaches for variant impact prediction. Following an innovative manner, we mainly focus on approaches for non-synonymous variants (nsSNVs) impact prediction and categorize them into six classes. Their underlying rationale and constraints, together with the concerns and remedies raised from comparative studies are discussed. We also present how the predictive approaches employed in different research. Although diverse constraints exist, the computational predictive approaches are indispensable in exploring genotype-phenotype relationship.
Collapse
Affiliation(s)
- Ye Liu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - William S. B. Yeung
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Philip C. N. Chiu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| | - Dandan Cao
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| |
Collapse
|
18
|
Baranwal M, Magner A, Saldinger J, Turali-Emre ES, Elvati P, Kozarekar S, VanEpps JS, Kotov NA, Violi A, Hero AO. Struct2Graph: a graph attention network for structure based predictions of protein–protein interactions. BMC Bioinformatics 2022; 23:370. [PMID: 36088285 PMCID: PMC9464414 DOI: 10.1186/s12859-022-04910-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/26/2022] [Indexed: 12/03/2022] Open
Abstract
Background Development of new methods for analysis of protein–protein interactions (PPIs) at molecular and nanometer scales gives insights into intracellular signaling pathways and will improve understanding of protein functions, as well as other nanoscale structures of biological and abiological origins. Recent advances in computational tools, particularly the ones involving modern deep learning algorithms, have been shown to complement experimental approaches for describing and rationalizing PPIs. However, most of the existing works on PPI predictions use protein-sequence information, and thus have difficulties in accounting for the three-dimensional organization of the protein chains. Results In this study, we address this problem and describe a PPI analysis based on a graph attention network, named Struct2Graph, for identifying PPIs directly from the structural data of folded protein globules. Our method is capable of predicting the PPI with an accuracy of 98.89% on the balanced set consisting of an equal number of positive and negative pairs. On the unbalanced set with the ratio of 1:10 between positive and negative pairs, Struct2Graph achieves a fivefold cross validation average accuracy of 99.42%. Moreover, Struct2Graph can potentially identify residues that likely contribute to the formation of the protein–protein complex. The identification of important residues is tested for two different interaction types: (a) Proteins with multiple ligands competing for the same binding area, (b) Dynamic protein–protein adhesion interaction. Struct2Graph identifies interacting residues with 30% sensitivity, 89% specificity, and 87% accuracy. Conclusions In this manuscript, we address the problem of prediction of PPIs using a first of its kind, 3D-structure-based graph attention network (code available at https://github.com/baranwa2/Struct2Graph). Furthermore, the novel mutual attention mechanism provides insights into likely interaction sites through its unsupervised knowledge selection process. This study demonstrates that a relatively low-dimensional feature embedding learned from graph structures of individual proteins outperforms other modern machine learning classifiers based on global protein features. In addition, through the analysis of single amino acid variations, the attention mechanism shows preference for disease-causing residue variations over benign polymorphisms, demonstrating that it is not limited to interface residues. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04910-9.
Collapse
|
19
|
Pollet L, Lambourne L, Xia Y. Structural Determinants of Yeast Protein-Protein Interaction Interface Evolution at the Residue Level. J Mol Biol 2022; 434:167750. [PMID: 35850298 DOI: 10.1016/j.jmb.2022.167750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 06/09/2022] [Accepted: 07/12/2022] [Indexed: 12/01/2022]
Abstract
Interfaces of contact between proteins play important roles in determining the proper structure and function of protein-protein interactions (PPIs). Therefore, to fully understand PPIs, we need to better understand the evolutionary design principles of PPI interfaces. Previous studies have uncovered that interfacial sites are more evolutionarily conserved than other surface protein sites. Yet, little is known about the nature and relative importance of evolutionary constraints in PPI interfaces. Here, we explore constraints imposed by the structure of the microenvironment surrounding interfacial residues on residue evolutionary rate using a large dataset of over 700 structural models of baker's yeast PPIs. We find that interfacial residues are, on average, systematically more conserved than all other residues with a similar degree of total burial as measured by relative solvent accessibility (RSA). Besides, we find that RSA of the residue when the PPI is formed is a better predictor of interfacial residue evolutionary rate than RSA in the monomer state. Furthermore, we investigate four structure-based measures of residue interfacial involvement, including change in RSA upon binding (ΔRSA), number of residue-residue contacts across the interface, and distance from the center or the periphery of the interface. Integrated modeling for evolutionary rate prediction in interfaces shows that ΔRSA plays a dominant role among the four measures of interfacial involvement, with minor, but independent contributions from other measures. These results yield insight into the evolutionary design of interfaces, improving our understanding of the role that structure plays in the molecular evolution of PPIs at the residue level.
Collapse
Affiliation(s)
- Léah Pollet
- Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada
| | - Luke Lambourne
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Yu Xia
- Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada.
| |
Collapse
|
20
|
Ducich NH, Mears JA, Bedoyan JK. Solvent accessibility of E1α and E1β residues with known missense mutations causing pyruvate dehydrogenase complex (PDC) deficiency: Impact on PDC-E1 structure and function. J Inherit Metab Dis 2022; 45:557-570. [PMID: 35038180 PMCID: PMC9297371 DOI: 10.1002/jimd.12477] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Revised: 01/11/2022] [Accepted: 01/12/2022] [Indexed: 11/08/2022]
Abstract
Pyruvate dehydrogenase complex deficiency is a major cause of primary lactic acidemia resulting in high morbidity and mortality, with limited therapeutic options. PDHA1 mutations are responsible for >82% of cases. The E1 component of PDC is a symmetric dimer of heterodimers (αβ/α'β') encoded by PDHA1 and PDHB. We measured solvent accessibility surface area (SASA), utilized nearest-neighbor analysis, incorporated sequence changes using mutagenesis tool in PyMOL, and performed molecular modeling with SWISS-MODEL, to investigate the impact of residues with disease-causing missense variants (DMVs) on E1 structure and function. We reviewed 166 and 13 genetically resolved cases due to PDHA1 and PDHB, respectively, from variant databases. We expanded on 102 E1α and 13 E1β nonduplicate DMVs. DMVs of E1α Arg112-Arg224 stretch (exons 5-7) and of E1α Arg residues constituted 40% and 39% of cases, respectively, with invariant Arg349 accounting for 22% of arginine replacements. SASA analysis showed that 86% and 84% of residues with nonduplicate DMVs of E1α and E1β, respectively, are solvent inaccessible ("buried"). Furthermore, 30% of E1α buried residues with DMVs are deleterious through perturbation of subunit-subunit interface contact (SSIC), with 73% located in the Arg112-Arg224 stretch. E1α Arg349 represented 74% of buried E1α Arg residues involved in SSIC. Structural perturbations resulting from residue replacements in some matched neighboring pairs of amino acids on different subunits involved in SSIC at 2.9-4.0 Å interatomic distance apart, exhibit similar clinical phenotype. Collectively, this work provides insight for future target-based advanced molecular modeling studies, with implications for development of novel therapeutics for specific recurrent DMVs of E1α.
Collapse
Affiliation(s)
- Nicole H. Ducich
- Case Western Reserve University (CWRU) School of Medicine, Cleveland, Ohio, USA
| | - Jason A. Mears
- Department of Pharmacology, CWRU, Cleveland, Ohio, USA
- Center for Mitochondrial Diseases, CWRU, Cleveland, Ohio, USA
| | - Jirair K. Bedoyan
- Division of Genetic and Genomic Medicine, UPMC Children’s Hospital of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
21
|
Backwell L, Marsh JA. Diverse Molecular Mechanisms Underlying Pathogenic Protein Mutations: Beyond the Loss-of-Function Paradigm. Annu Rev Genomics Hum Genet 2022; 23:475-498. [PMID: 35395171 DOI: 10.1146/annurev-genom-111221-103208] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Most known disease-causing mutations occur in protein-coding regions of DNA. While some of these involve a loss of protein function (e.g., through premature stop codons or missense changes that destabilize protein folding), many act via alternative molecular mechanisms and have dominant-negative or gain-of-function effects. In nearly all cases, these non-loss-of-function mutations can be understood by considering interactions of the wild-type and mutant protein with other molecules, such as proteins, nucleic acids, or small ligands and substrates. Here, we review the diverse molecular mechanisms by which pathogenic mutations can have non-loss-of-function effects, including by disrupting interactions, increasing binding affinity, changing binding specificity, causing assembly-mediated dominant-negative and dominant-positive effects, creating novel interactions, and promoting aggregation and phase separation. We believe that increased awareness of these diverse molecular disease mechanisms will lead to improved diagnosis (and ultimately treatment) of human genetic disorders. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 23 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Lisa Backwell
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom;
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom;
| |
Collapse
|
22
|
Missense variants in human ACE2 strongly affect binding to SARS-CoV-2 Spike providing a mechanism for ACE2 mediated genetic risk in Covid-19: A case study in affinity predictions of interface variants. PLoS Comput Biol 2022; 18:e1009922. [PMID: 35235558 PMCID: PMC8920257 DOI: 10.1371/journal.pcbi.1009922] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 03/14/2022] [Accepted: 02/13/2022] [Indexed: 12/19/2022] Open
Abstract
SARS-CoV-2 Spike (Spike) binds to human angiotensin-converting enzyme 2 (ACE2) and the strength of this interaction could influence parameters relating to virulence. To explore whether population variants in ACE2 influence Spike binding and hence infection, we selected 10 ACE2 variants based on affinity predictions and prevalence in gnomAD and measured their affinities and kinetics for Spike receptor binding domain through surface plasmon resonance (SPR) at 37°C. We discovered variants that reduce and enhance binding, including three ACE2 variants that strongly inhibited (p.Glu37Lys, ΔΔG = –1.33 ± 0.15 kcal mol-1 and p.Gly352Val, predicted ΔΔG = –1.17 kcal mol-1) or abolished (p.Asp355Asn) binding. We also identified two variants with distinct population distributions that enhanced affinity for Spike. ACE2 p.Ser19Pro (ΔΔG = 0.59 ± 0.08 kcal mol-1) is predominant in the gnomAD African cohort (AF = 0.003) whilst p.Lys26Arg (ΔΔG = 0.26 ± 0.09 kcal mol-1) is predominant in the Ashkenazi Jewish (AF = 0.01) and European non-Finnish (AF = 0.006) cohorts. We compared ACE2 variant affinities to published SARS-CoV-2 pseudotype infectivity data and confirmed that ACE2 variants with reduced affinity for Spike can protect cells from infection. The effect of variants with enhanced Spike affinity remains unclear, but we propose a mechanism whereby these alleles could cause greater viral spreading across tissues and cell types, as is consistent with emerging understanding regarding the interplay between receptor affinity and cell-surface abundance. Finally, we compared mCSM-PPI2 ΔΔG predictions against our SPR data to assess the utility of predictions in this system. We found that predictions of decreased binding were well-correlated with experiment and could be improved by calibration, but disappointingly, predictions of highly enhanced binding were unreliable. Recalibrated predictions for all possible ACE2 missense variants at the Spike interface were calculated and used to estimate the overall burden of ACE2 variants on Covid-19. One of the first things the SARS-CoV-2 virus does to invade human cells is bind to a cell surface receptor called angiotensin-converting enzyme 2 (ACE2). The virus attaches to this receptor through its Spike protein and knowledge from other viruses tells us that the strength of this interaction influences how infectious and or virulent it is. We hypothesised that the Spike-ACE2 affinity might vary in people who have different amino acids in the part of ACE2 where Spike binds and consequently might be protected–or more at risk–from the virus. To test this idea, we measured the affinity of several ACE2 mutants, representing different versions found in humans, for the Spike protein and we found that some strengthened the interactions alongside others that weakened it. Most of these variants are rare, but two are present in over 1 in 1,000 individuals in certain populations and so might be important for the epidemiology of COVID-19. We then used computational methods to predict the affinity of even more ACE2 mutants than we could test in the lab and again found many that might alter this interaction. These data may help identify people who are at higher or lower risk from COVID-19.
Collapse
|
23
|
Xiong D, Lee D, Li L, Zhao Q, Yu H. Implications of disease-related mutations at protein-protein interfaces. Curr Opin Struct Biol 2022; 72:219-225. [PMID: 34959033 PMCID: PMC8863207 DOI: 10.1016/j.sbi.2021.11.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 11/01/2021] [Accepted: 11/18/2021] [Indexed: 02/03/2023]
Abstract
Protein-protein interfaces have been attracting great attention owing to their critical roles in protein-protein interactions and the fact that human disease-related mutations are generally enriched in them. Recently, substantial research progress has been made in this field, which has significantly promoted the understanding and treatment of various human diseases. For example, many studies have discovered the properties of disease-related mutations. Besides, as more large-scale experimental data become available, various computational approaches have been proposed to advance our understanding of disease mutations from the data. Here, we overview recent advances in characteristics of disease-related mutations at protein-protein interfaces, mutation effects on protein interactions, and investigation of mutations on specific diseases.
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Le Li
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Qiuye Zhao
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
24
|
The properties of human disease mutations at protein interfaces. PLoS Comput Biol 2022; 18:e1009858. [PMID: 35120134 PMCID: PMC8849535 DOI: 10.1371/journal.pcbi.1009858] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 02/16/2022] [Accepted: 01/24/2022] [Indexed: 12/27/2022] Open
Abstract
The assembly of proteins into complexes and their interactions with other biomolecules are often vital for their biological function. While it is known that mutations at protein interfaces have a high potential to be damaging and cause human genetic disease, there has been relatively little consideration for how this varies between different types of interfaces. Here we investigate the properties of human pathogenic and putatively benign missense variants at homomeric (isologous and heterologous), heteromeric, DNA, RNA and other ligand interfaces, and at different regions in proteins with respect to those interfaces. We find that different types of interfaces vary greatly in their propensity to be associated with pathogenic mutations, with homomeric heterologous and DNA interfaces being particularly enriched in disease. We also find that residues that do not directly participate in an interface, but are close in three-dimensional space, show a significant disease enrichment. Finally, we observe that mutations at different types of interfaces tend to have distinct property changes when undergoing amino acid substitutions associated with disease, and that this is linked to substantial variability in their identification by computational variant effect predictors. Nearly all proteins interact with other molecules as part of their biological function. For example, proteins can interact with other copies of the same type of protein, with different proteins, with DNA, or with small ligand molecules. Many mutations at protein interfaces, the regions of proteins that interact with other molecules, are known to cause human genetic disease. In this study, we first investigate how different types of protein interfaces have different tendencies to be associated with disease. We also show that the closer a mutation is to an interface, the more likely it is to cause disease. Finally, we study how mutations at different types of interfaces tend to be associated with different changes in amino acid properties, which appears to influence our ability to computationally predict the effects of mutations. Ultimately, we hope that consideration of protein interface properties will eventually improve our ability to identify new disease-causing mutations.
Collapse
|
25
|
Abstract
The biological significance of proteins attracted the scientific community in exploring their characteristics. The studies shed light on the interaction patterns and functions of proteins in a living body. Due to their practical difficulties, reliable experimental techniques pave the way for introducing computational methods in the interaction prediction. Automated methods reduced the difficulties but could not yet replace experimental studies as the field is still evolving. Interaction prediction problem being critical needs highly accurate results, but none of the existing methods could offer reliable performance that can parallel with experimental results yet. This article aims to assess the existing computational docking algorithms, their challenges, and future scope. Blind docking techniques are quite helpful when no information other than the individual structures are available. As more and more complex structures are being added to different databases, information-driven approaches can be a good alternative. Artificial intelligence, ruling over the major fields, is expected to take over this domain very shortly.
Collapse
|
26
|
Shauli T, Brandes N, Linial M. Evolutionary and functional lessons from human-specific amino acid substitution matrices. NAR Genom Bioinform 2021; 3:lqab079. [PMID: 34541526 PMCID: PMC8445205 DOI: 10.1093/nargab/lqab079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/02/2021] [Accepted: 09/14/2021] [Indexed: 12/26/2022] Open
Abstract
Human genetic variation in coding regions is fundamental to the study of protein structure and function. Most methods for interpreting missense variants consider substitution measures derived from homologous proteins across different species. In this study, we introduce human-specific amino acid (AA) substitution matrices that are based on genetic variations in the modern human population. We analyzed the frequencies of >4.8M single nucleotide variants (SNVs) at codon and AA resolution and compiled human-centric substitution matrices that are fundamentally different from classic cross-species matrices (e.g. BLOSUM, PAM). Our matrices are asymmetric, with some AA replacements showing significant directional preference. Moreover, these AA matrices are only partly predicted by nucleotide substitution rates. We further test the utility of our matrices in exposing functional signals of experimentally-validated protein annotations. A significant reduction in AA transition frequencies was observed across nine post-translational modification (PTM) types and four ion-binding sites. Our results propose a purifying selection signal in the human proteome across a diverse set of functional protein annotations and provide an empirical baseline for interpreting human genetic variation in coding regions.
Collapse
Affiliation(s)
- Tair Shauli
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Nadav Brandes
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| |
Collapse
|
27
|
Ozturk K, Carter H. Predicting functional consequences of mutations using molecular interaction network features. Hum Genet 2021; 141:1195-1210. [PMID: 34432150 PMCID: PMC8873243 DOI: 10.1007/s00439-021-02329-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Accepted: 07/31/2021] [Indexed: 12/13/2022]
Abstract
Variant interpretation remains a central challenge for precision medicine. Missense variants are particularly difficult to understand as they change only a single amino acid in a protein sequence yet can have large and varied effects on protein activity. Numerous tools have been developed to identify missense variants with putative disease consequences from protein sequence and structure. However, biological function arises through higher order interactions among proteins and molecules within cells. We therefore sought to capture information about the potential of missense mutations to perturb protein interaction networks by integrating protein structure and interaction data. We developed 16 network-based annotations for missense mutations that provide orthogonal information to features classically used to prioritize variants. We then evaluated them in the context of a proven machine-learning framework for variant effect prediction across multiple benchmark datasets to demonstrate their potential to improve variant classification. Interestingly, network features resulted in larger performance gains for classifying somatic mutations than for germline variants, possibly due to different constraints on what mutations are tolerated at the cellular versus organismal level. Our results suggest that modeling variant potential to perturb context-specific interactome networks is a fruitful strategy to advance in silico variant effect prediction.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Division of Medical Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA.,Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA. .,Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA. .,Moores Cancer Center, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
28
|
Li G, Panday SK, Peng Y, Alexov E. SAMPDI-3D: predicting the effects of protein and DNA mutations on protein-DNA interactions. Bioinformatics 2021; 37:3760-3765. [PMID: 34343273 DOI: 10.1093/bioinformatics/btab567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/28/2021] [Accepted: 07/31/2021] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Mutations that alter protein-DNA interactions may be pathogenic and cause diseases. Therefore, it is extremely important to quantify the effect of mutations on protein-DNA binding free energy to reveal the molecular origin of diseases and to assist the development of treatments. Although several methods that predict the change of protein-DNA binding affinity upon mutations in the binding protein were developed, the effect of DNA mutations was not considered yet. RESULTS Here, we report a new version of SAMPDI, the SAMPDI-3D, which is a gradient boosting decision tree machine learning method to predict the change of the protein-DNA binding free energy caused by mutations in both the binding protein and the bases of the corresponding DNA. The method is shown to achieve Pearson correlation coefficient of 0.76 and 0.80 in a benchmarking test against experimentally determined change of the binding free energy caused by mutations in the binding protein or DNA, respectively. Furthermore, three datasets collected from literature were used to do blind benchmark for SAMPDI-3D and it is shown that it outperforms all existing state-of-the-art methods. The method is very fast allowing for genome-scale investigations. AVAILABILITY It is available as a web server and a stand-code at http://compbio.clemson.edu/SAMPDI-3D/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gen Li
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | | | - Yunhui Peng
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
29
|
Mukherjee I, Chakrabarti S. Co-evolutionary landscape at the interface and non-interface regions of protein-protein interaction complexes. Comput Struct Biotechnol J 2021; 19:3779-3795. [PMID: 34285778 PMCID: PMC8271121 DOI: 10.1016/j.csbj.2021.06.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/22/2021] [Accepted: 06/22/2021] [Indexed: 11/16/2022] Open
Abstract
Proteins involved in interactions throughout the course of evolution tend to co-evolve and compensatory changes may occur in interacting proteins to maintain or refine such interactions. However, certain residue pair alterations may prove to be detrimental for functional interactions. Hence, determining co-evolutionary pairings that could be structurally or functionally relevant for maintaining the conservation of an inter-protein interaction is important. Inter-protein co-evolution analysis in several complexes utilizing multiple existing methodologies suggested that co-evolutionary pairings can occur in spatially proximal and distant regions in inter-protein interactions. Subsequently, the Co-Var (Correlated Variation) method based on mutual information and Bhattacharyya coefficient was developed, validated, and found to perform relatively better than CAPS and EV-complex. Interestingly, while applying the Co-Var measure and EV-complex program on a set of protein-protein interaction complexes, co-evolutionary pairings were obtained in interface and non-interface regions in protein complexes. The Co-Var approach involves determining high degree co-evolutionary pairings that include multiple co-evolutionary connections between particular co-evolved residue positions in one protein with multiple residue positions in the binding partner. Detailed analyses of high degree co-evolutionary pairings in protein-protein complexes involved in cancer metastasis suggested that most of the residue positions forming such co-evolutionary connections mainly occurred within functional domains of constituent proteins and substitution mutations were also common among these positions. The physiological relevance of these predictions suggested that Co-Var can predict residues that could be crucial for preserving functional protein-protein interactions. Finally, Co-Var web server (http://www.hpppi.iicb.res.in/ishi/covar/index.html) that implements this methodology identifies co-evolutionary pairings in intra and inter-protein interactions.
Collapse
Affiliation(s)
- Ishita Mukherjee
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR) - Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR) - Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| |
Collapse
|
30
|
Das S, Scholes HM, Sen N, Orengo C. CATH functional families predict functional sites in proteins. Bioinformatics 2021; 37:1099-1106. [PMID: 33135053 PMCID: PMC8150129 DOI: 10.1093/bioinformatics/btaa937] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| |
Collapse
|
31
|
Petrosino M, Novak L, Pasquo A, Chiaraluce R, Turina P, Capriotti E, Consalvi V. Analysis and Interpretation of the Impact of Missense Variants in Cancer. Int J Mol Sci 2021; 22:ijms22115416. [PMID: 34063805 PMCID: PMC8196604 DOI: 10.3390/ijms22115416] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/03/2021] [Accepted: 05/17/2021] [Indexed: 01/10/2023] Open
Abstract
Large scale genome sequencing allowed the identification of a massive number of genetic variations, whose impact on human health is still unknown. In this review we analyze, by an in silico-based strategy, the impact of missense variants on cancer-related genes, whose effect on protein stability and function was experimentally determined. We collected a set of 164 variants from 11 proteins to analyze the impact of missense mutations at structural and functional levels, and to assess the performance of state-of-the-art methods (FoldX and Meta-SNP) for predicting protein stability change and pathogenicity. The result of our analysis shows that a combination of experimental data on protein stability and in silico pathogenicity predictions allowed the identification of a subset of variants with a high probability of having a deleterious phenotypic effect, as confirmed by the significant enrichment of the subset in variants annotated in the COSMIC database as putative cancer-driving variants. Our analysis suggests that the integration of experimental and computational approaches may contribute to evaluate the risk for complex disorders and develop more effective treatment strategies.
Collapse
Affiliation(s)
- Maria Petrosino
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Leonore Novak
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Alessandra Pasquo
- ENEA CR Frascati, Diagnostics and Metrology Laboratory FSN-TECFIS-DIM, 00044 Frascati, Italy;
| | - Roberta Chiaraluce
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Paola Turina
- Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy;
| | - Emidio Capriotti
- Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy;
- Correspondence: (E.C.); (V.C.)
| | - Valerio Consalvi
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
- Correspondence: (E.C.); (V.C.)
| |
Collapse
|
32
|
Missense3D-DB web catalogue: an atom-based analysis and repository of 4M human protein-coding genetic variants. Hum Genet 2021; 140:805-812. [PMID: 33502607 PMCID: PMC8052235 DOI: 10.1007/s00439-020-02246-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 12/07/2020] [Indexed: 12/22/2022]
Abstract
The interpretation of human genetic variation is one of the greatest challenges of modern genetics. New approaches are urgently needed to prioritize variants, especially those that are rare or lack a definitive clinical interpretation. We examined 10,136,597 human missense genetic variants from GnomAD, ClinVar and UniProt. We were able to perform large-scale atom-based mapping and phenotype interpretation of 3,960,015 of these variants onto 18,874 experimental and 84,818 in house predicted three-dimensional coordinates of the human proteome. We demonstrate that 14% of amino acid substitutions from the GnomAD database that could be structurally analysed are predicted to affect protein structure (n = 568,548, of which 566,439 rare or extremely rare) and may, therefore, have a yet unknown disease-causing effect. The same is true for 19.0% (n = 6266) of variants of unknown clinical significance or conflicting interpretation reported in the ClinVar database. The results of the structural analysis are available in the dedicated web catalogue Missense3D-DB ( http://missense3d.bc.ic.ac.uk/ ). For each of the 4 M variants, the results of the structural analysis are presented in a friendly concise format that can be included in clinical genetic reports. A detailed report of the structural analysis is also available for the non-experts in structural biology. Population frequency and predictions from SIFT and PolyPhen are included for a more comprehensive variant interpretation. This is the first large-scale atom-based structural interpretation of human genetic variation and offers geneticists and the biomedical community a new approach to genetic variant interpretation.
Collapse
|
33
|
Amengual-Rigo P, Fernández-Recio J, Guallar V. UEP: an open-source and fast classifier for predicting the impact of mutations in protein-protein complexes. Bioinformatics 2021; 37:334-341. [PMID: 32761082 DOI: 10.1093/bioinformatics/btaa708] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 07/23/2020] [Accepted: 07/31/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Single protein residue mutations may reshape the binding affinity of protein-protein interactions. Therefore, predicting its effects is of great interest in biotechnology and biomedicine. Unfortunately, the availability of experimental data on binding affinity changes upon mutation is limited, which hampers the development of new and more precise algorithms. Here, we propose UEP, a classifier for predicting beneficial and detrimental mutations in protein-protein complexes trained on interactome data. RESULTS Regardless of the simplicity of the UEP algorithm, which is based on a simple three-body contact potential derived from interactome data, we report competitive results with the gold standard methods in this field with the advantage of being faster in terms of computational time. Moreover, we propose a consensus selection procedure by involving the combination of three predictors that showed higher classification accuracy in our benchmark: UEP, pyDock and EvoEF1/FoldX. Overall, we demonstrate that the analysis of interactome data allows predicting the impact of protein-protein mutations using UEP, a fast and reliable open-source code. AVAILABILITY AND IMPLEMENTATION UEP algorithm can be found at: https://github.com/pepamengual/UEP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pep Amengual-Rigo
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Juan Fernández-Recio
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.,Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC-Universidad de la Rioja-Gobierno de la Rioja, 26007 Logroño, Spain
| | - Victor Guallar
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.,ICREA: Institució Catalana de Recerca i Estudis Avançats, 08010 Barcelona, Spain
| |
Collapse
|
34
|
Lam SD, Babu MM, Lees J, Orengo CA. Biological impact of mutually exclusive exon switching. PLoS Comput Biol 2021; 17:e1008708. [PMID: 33651795 PMCID: PMC7954323 DOI: 10.1371/journal.pcbi.1008708] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 03/12/2021] [Accepted: 01/14/2021] [Indexed: 12/27/2022] Open
Abstract
Alternative splicing can expand the diversity of proteomes. Homologous mutually exclusive exons (MXEs) originate from the same ancestral exon and result in polypeptides with similar structural properties but altered sequence. Why would some genes switch homologous exons and what are their biological impact? Here, we analyse the extent of sequence, structural and functional variability in MXEs and report the first large scale, structure-based analysis of the biological impact of MXE events from different genomes. MXE-specific residues tend to map to single domains, are highly enriched in surface exposed residues and cluster at or near protein functional sites. Thus, MXE events are likely to maintain the protein fold, but alter specificity and selectivity of protein function. This comprehensive resource of MXE events and their annotations is available at: http://gene3d.biochem.ucl.ac.uk/mxemod/. These findings highlight how small, but significant changes at critical positions on a protein surface are exploited in evolution to alter function.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London, United Kingdom
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia
- * E-mail: (SDL); (JL); (CO)
| | - M. Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Structural Biology and Center for Data Driven Discovery, St Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Jonathan Lees
- Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
- * E-mail: (SDL); (JL); (CO)
| | - Christine A. Orengo
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London, United Kingdom
- * E-mail: (SDL); (JL); (CO)
| |
Collapse
|
35
|
Sitani D, Giorgetti A, Alfonso-Prieto M, Carloni P. Robust principal component analysis-based prediction of protein-protein interaction hot spots. Proteins 2021; 89:639-647. [PMID: 33458895 DOI: 10.1002/prot.26047] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/28/2020] [Accepted: 12/31/2020] [Indexed: 12/21/2022]
Abstract
Proteins often exert their function by binding to other cellular partners. The hot spots are key residues for protein-protein binding. Their identification may shed light on the impact of disease associated mutations on protein complexes and help design protein-protein interaction inhibitors for therapy. Unfortunately, current machine learning methods to predict hot spots, suffer from limitations caused by gross errors in the data matrices. Here, we present a novel data pre-processing pipeline that overcomes this problem by recovering a low rank matrix with reduced noise using Robust Principal Component Analysis. Application to existing databases shows the predictive power of the method.
Collapse
Affiliation(s)
- Divya Sitani
- JARA-Institute: Molecular Neuroscience and Neuroimaging, Institute for Neuroscience and Medicine INM-11/JARA-BRAIN Institute JBI-2, Forschungszentrum Jülich GmbH, Jülich, Germany.,Department of Biology, RWTH Aachen University, Aachen, Germany
| | - Alejandro Giorgetti
- Institute for Advanced Simulations IAS-5 / Institute for Neuroscience and Medicine INM-9, Computational Biomedicine, Forschungszentrum Jülich GmbH, Jülich, Germany.,Department of Biotechnology, University of Verona, Verona, Italy
| | - Mercedes Alfonso-Prieto
- Institute for Advanced Simulations IAS-5 / Institute for Neuroscience and Medicine INM-9, Computational Biomedicine, Forschungszentrum Jülich GmbH, Jülich, Germany.,Cécile and Oskar Vogt Institute for Brain Research, University Hospital Düsseldorf, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Paolo Carloni
- JARA-Institute: Molecular Neuroscience and Neuroimaging, Institute for Neuroscience and Medicine INM-11/JARA-BRAIN Institute JBI-2, Forschungszentrum Jülich GmbH, Jülich, Germany.,Institute for Advanced Simulations IAS-5 / Institute for Neuroscience and Medicine INM-9, Computational Biomedicine, Forschungszentrum Jülich GmbH, Jülich, Germany.,Department of Physics, RWTH Aachen University, Aachen, Germany.,JARA-HPC, IAS-5/INM-9 Computational Biomedicine, Forschungszentrum Jülich GmbH, Jülich, Germany
| |
Collapse
|
36
|
Modi T, Campitelli P, Kazan IC, Ozkan SB. Protein folding stability and binding interactions through the lens of evolution: a dynamical perspective. Curr Opin Struct Biol 2020; 66:207-215. [PMID: 33388636 DOI: 10.1016/j.sbi.2020.11.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 11/02/2020] [Accepted: 11/26/2020] [Indexed: 01/06/2023]
Abstract
While the function of a protein depends heavily on its ability to fold into a correct 3D structure, billions of years of evolution have tailored proteins from highly stable objects to flexible molecules as they adapted to environmental changes. Nature maintains the fine balance of protein folding and stability while still evolving towards new function through generations of fine-tuning necessary interactions with other proteins and small molecules. Here we focus on recent computational and experimental studies that shed light onto how evolution molds protein folding and the functional landscape from a conformational dynamics' perspective. Particularly, we explore the importance of dynamic allostery throughout protein evolution and discuss how the protein anisotropic network can give rise to allosteric and epistatic interactions.
Collapse
Affiliation(s)
- Tushar Modi
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85287-1504, USA
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85287-1504, USA
| | - Ismail Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85287-1504, USA
| | - Sefika Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85287-1504, USA.
| |
Collapse
|
37
|
Bermúdez-Guzmán L, Jimenez-Huezo G, Arguedas A, Leal A. Mutational survivorship bias: The case of PNKP. PLoS One 2020; 15:e0237682. [PMID: 33332469 PMCID: PMC7746193 DOI: 10.1371/journal.pone.0237682] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 11/23/2020] [Indexed: 01/21/2023] Open
Abstract
The molecular function of a protein relies on its structure. Understanding how variants alter structure and function in multidomain proteins is key to elucidate the generation of a pathological phenotype. However, one may fall into the logical bias of assessing protein damage only based on the variants that are visible (survivorship bias), which can lead to partial conclusions. This is the case of PNKP, an important nuclear and mitochondrial DNA repair enzyme with both kinase and phosphatase function. Most variants in PNKP are confined to the kinase domain, leading to a pathological spectrum of three apparently distinct clinical entities. Since proteins and domains may have a different tolerability to variation, we evaluated whether variants in PNKP are under survivorship bias. Here, we provide the evidence that supports a higher tolerance in the kinase domain even when all variants reported are deleterious. Instead, the phosphatase domain is less tolerant due to its lower variant rates, a higher degree of sequence conservation, lower dN/dS ratios, and the presence of more disease-propensity hotspots. Together, our results support previous experimental evidence that demonstrated that the phosphatase domain is functionally more necessary and relevant for DNA repair, especially in the context of the development of the central nervous system. Finally, we propose the term "Wald’s domain" for future studies analyzing the possible survivorship bias in multidomain proteins.
Collapse
Affiliation(s)
- Luis Bermúdez-Guzmán
- Section of Genetics and Biotechnology, School of Biology, University de Costa Rica, San Pedro, San José, Costa Rica
| | - Gabriel Jimenez-Huezo
- Section of Genetics and Biotechnology, School of Biology, University de Costa Rica, San Pedro, San José, Costa Rica
| | - Andrés Arguedas
- School of Statistics, University de Costa Rica, San Pedro, San José, Costa Rica
| | - Alejandro Leal
- Section of Genetics and Biotechnology, School of Biology, University de Costa Rica, San Pedro, San José, Costa Rica
| |
Collapse
|
38
|
Thirumal Kumar D, Udhaya Kumar S, Magesh R, George Priya Doss C. Investigating mutations at the hotspot position of the ERBB2 and screening for the novel lead compound to treat breast cancer - a computational approach. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2020; 123:49-71. [PMID: 33485488 DOI: 10.1016/bs.apcsb.2020.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Membrane proteins are the most common types of cancer that are active in the prognosis. Membrane proteins are a distinguishing characteristic of a cancer cell. In tumor cell therapy, the overexpressed membrane proteins are becoming ever more relevant. The 3-kinase (PI3K)/AKT phosphatidylinositol pathway is downstream triggered by different extracellular signals, and this signaling pathway activation impacts a variety of proliferation of the cellular processes like cell growth and surviving. Frequent PI3K/AKT dysregulation in human cancer has rendered proteins of this pathway desirable for diagnostic markers. Members of the ERBB family-like ERBB2 and ERBB3 activate intracellular signaling pathways such as PI3K/AKT. The mutations in these proteins dysfunctions the proteins in the downstream. Considering this importance, we have developed a computational pipeline to identify the mutation position with a highest number of mutations and to screen them for pathogenicity, stability, conservation, and structural changes using PredictSNP, iStable, ConSurf, and GROMACS simulation software respectively. Further, a virtual screening approach was initiated to find the most similar non-toxic lead compound, which could be an alternative to the currently used lapatinib. To conclude, protein-ligand dynamics were undertaken to study the actions of native and mutants with the lapatinib and the lead compound. From the overall analysis, we identified position 755 with leucine in the native condition is prone to frequent mutations. The leucine at 755th position is more prone to mutate as serine and tryptophan. Further from the computational analysis, we identified that the mutation L755S is more significant than the L755W mutation. We have witnessed CID140590176 be a potential lead compound with no toxicity. The behavior of the lead compound has shown more compactness with an increased number of intermolecular hydrogen bonds in the ERBB2 with L755S. This lead compound can be further taken for experimental validations, and we believe that this lead compound could be a potent ERBB2 inhibitor.
Collapse
Affiliation(s)
- D Thirumal Kumar
- School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - S Udhaya Kumar
- School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India.
| | - R Magesh
- Faculty of Biomedical Sciences, Technology & Research, Department of Biotechnology, Sri Ramachandra University, Chennai, Tamil Nadu, India
| | - C George Priya Doss
- School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India.
| |
Collapse
|
39
|
Shin WH, Kumazawa K, Imai K, Hirokawa T, Kihara D. Current Challenges and Opportunities in Designing Protein-Protein Interaction Targeted Drugs. Adv Appl Bioinform Chem 2020; 13:11-25. [PMID: 33209039 PMCID: PMC7669531 DOI: 10.2147/aabc.s235542] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 10/22/2020] [Indexed: 12/24/2022] Open
Abstract
It has been noticed that the efficiency of drug development has been decreasing in the past few decades. To overcome the situation, protein-protein interactions (PPIs) have been identified as new drug targets as early as 2000. PPIs are more abundant in human cells than single proteins and play numerous important roles in cellular processes including diseases. However, PPIs have very different physicochemical features from the conventional drug targets, which make targeting PPIs challenging. Therefore, as of now, only a small number of PPI inhibitors have been approved or progressed to a stage of clinical trial. In this article, we first overview previous works that analyzed differences between PPIs with PPI targeting ligands and conventional drugs with their binding pockets. Then, we constructed an up-to-date list of PPI targeting drugs that have been approved or are currently under clinical trial and have bound drug-target structures available. Using the dataset, we analyzed the PPIs and their ligands using several scores of druggability. Druggability scores showed that PPI sites and their drugs targeting PPIs are less druggable than conventional binding pockets and drugs, which also indicates that PPI drugs do not follow the conventional rules for drug design, such as Lipinski's rule of five. Our analyses suggest that developing a new rule would be beneficial for guiding PPI-drug discovery.
Collapse
Affiliation(s)
- Woong-Hee Shin
- Department of Chemical Science Education, Sunchon National University, Suncheon57922, Republic of Korea
| | - Keiko Kumazawa
- Pharmaceutical Discovery Research Laboratories, Teijin Pharma Limited, Tokyo191-8512, Japan
| | - Kenichiro Imai
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo135-0064, Japan
| | - Takatsugu Hirokawa
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo135-0064, Japan
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN47906, USA
- Department of Computer Science, Purdue University, West Lafayette, IN47906, USA
- Center for Cancer Research, Purdue University, West Lafayette, IN47906, USA
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Care, University of Cincinnati, Cincinnati, OH45229, USA
| |
Collapse
|
40
|
Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants. Proc Natl Acad Sci U S A 2020; 117:28201-28211. [PMID: 33106425 PMCID: PMC7668189 DOI: 10.1073/pnas.2002660117] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Recent large-scale sequencing efforts have enabled the detection of millions of missense variants. Elucidating their functional effect is of crucial importance but challenging. We approach this problem by performing a wide-scale characterization of missense variants from 1,330 disease-associated genes using >14,000 protein structures. We identify 3D features associated with pathogenic and benign variants that unveiled the mutations’ effect at the molecular level. We further extend our analysis to account for the different essential structural regions in proteins performing different functions. By analyzing variants from 24 gene groups encoding for different protein functional families, we capture function-specific characteristics of missense variants, which match the experimental readouts. We show that our results derived using structural data will effectively inform variant interpretation. Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms.
Collapse
|
41
|
Yazar M, Özbek P. In Silico Tools and Approaches for the Prediction of Functional and Structural Effects of Single-Nucleotide Polymorphisms on Proteins: An Expert Review. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 25:23-37. [PMID: 33058752 DOI: 10.1089/omi.2020.0141] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Single-nucleotide polymorphisms (SNPs) are single-base variants that contribute to human biological variation and pathogenesis of many human diseases. Among all SNP types, nonsynonymous single-nucleotide polymorphisms (nsSNPs) can alter many structural, biochemical, and functional features of a protein such as folding characteristics, charge distribution, stability, dynamics, and interactions with other proteins/nucleotides. These modifications in the protein structure can lead nsSNPs to be closely associated with many multifactorial diseases such as cancer, diabetes, and neurodegenerative diseases. Predicting structural and functional effects of nsSNPs with experimental approaches can be time-consuming and costly; hence, computational prediction tools and algorithms are being widely and increasingly utilized in biology and medical research. This expert review examines the in silico tools and algorithms for the prediction of functional or structural effects of SNP variants, in addition to the description of the phenotypic effects of nsSNPs on protein structure, association between pathogenicity of variants, and functional or structural features of disease-associated variants. Finally, case studies investigating the functional and structural effects of nsSNPs on selected protein structures are highlighted. We conclude that creating a consistent workflow with a combination of in silico approaches or tools should be considered to increase the performance, accuracy, and precision of the biological and clinical predictions made in silico.
Collapse
Affiliation(s)
- Metin Yazar
- Department of Bioengineering, Marmara University, Göztepe, İstanbul, Turkey.,Department of Genetics and Bioengineering, Istanbul Okan University, Tuzla, Istanbul, Turkey
| | - Pemra Özbek
- Department of Bioengineering, Marmara University, Göztepe, İstanbul, Turkey
| |
Collapse
|
42
|
Insights into changes in binding affinity caused by disease mutations in protein-protein complexes. Comput Biol Med 2020; 123:103829. [DOI: 10.1016/j.compbiomed.2020.103829] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/20/2020] [Accepted: 05/20/2020] [Indexed: 01/11/2023]
|
43
|
Protein-Protein Interactions Mediated by Intrinsically Disordered Protein Regions Are Enriched in Missense Mutations. Biomolecules 2020; 10:biom10081097. [PMID: 32722039 PMCID: PMC7463635 DOI: 10.3390/biom10081097] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 07/15/2020] [Accepted: 07/20/2020] [Indexed: 12/27/2022] Open
Abstract
Because proteins are fundamental to most biological processes, many genetic diseases can be traced back to single nucleotide variants (SNVs) that cause changes in protein sequences. However, not all SNVs that result in amino acid substitutions cause disease as each residue is under different structural and functional constraints. Influential studies have shown that protein–protein interaction interfaces are enriched in disease-associated SNVs and depleted in SNVs that are common in the general population. These studies focus primarily on folded (globular) protein domains and overlook the prevalent class of protein interactions mediated by intrinsically disordered regions (IDRs). Therefore, we investigated the enrichment patterns of missense mutation-causing SNVs that are associated with disease and cancer, as well as those present in the healthy population, in structures of IDR-mediated interactions with comparisons to classical globular interactions. When comparing the different categories of interaction interfaces, division of the interface regions into solvent-exposed rim residues and buried core residues reveal distinctive enrichment patterns for the various types of missense mutations. Most notably, we demonstrate a strong enrichment at the interface core of interacting IDRs in disease mutations and its depletion in neutral ones, which supports the view that the disruption of IDR interactions is a mechanism underlying many diseases. Intriguingly, we also found an asymmetry across the IDR interaction interface in the enrichment of certain missense mutation types, which may hint at an increased variant tolerance and urges further investigations of IDR interactions.
Collapse
|
44
|
Meriño-Cabrera Y, Severiche Castro JG, Rios Diez JD, Rodrigues Macedo ML, de Oliveira Mendes TA, Goreti de Almeida Oliveira M. Rational design of mimetic peptides based on the interaction between Inga laurina inhibitor and trypsins for Spodoptera cosmioides pest control. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2020; 122:103390. [PMID: 32360954 DOI: 10.1016/j.ibmb.2020.103390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 03/23/2020] [Accepted: 04/13/2020] [Indexed: 06/11/2023]
Abstract
The interaction of Inga laurina Kunitz inhibitor with insect trypsins is an example of protein-protein interaction with potential application for the pest control. However, the crop field application of proteins as inhibitors is limited due to high production cost, the large molecular size and low environmental stability. The use of mimetic peptides that have molecular features associated with the protein inhibitor can result in a product with lower cost and higher efficiency for the agricultural application. Here, we designed mimetic peptides deriving from globular domains of ILTI that are predicted to interact with trypsin enzymes of Lepidoptera pest. Two linear peptides were identified and synthetized from the interface of interaction between trypsin-ILTI complexes. These peptides were derived due to its high-energy contribution for the biding affinity between the enzyme-protein inhibitor. The peptides showed structural stability, propensity to adopt the bound conformation also without the context of the protein, inhibitory activity of digestive trypsins and toxic effects on the S. cosmioides, indicating that they can be used as potential inhibitor for pest control.
Collapse
Affiliation(s)
- Yaremis Meriño-Cabrera
- Departamento de Bioquímica e Biologia molecular, Universidade Federal de Viçosa, Minas Gerais, Brazil; Instituto de Biotecnologia aplicada à Agropecuaria, BIOAGRO-UFV, Viçosa, Minas Gerais, Brazil
| | | | - Juan Diego Rios Diez
- Instituto de Biotecnologia aplicada à Agropecuaria, BIOAGRO-UFV, Viçosa, Minas Gerais, Brazil
| | - Maria Ligia Rodrigues Macedo
- Laboratório de Purificação de Proteínas e suas Funções Biológicas, Unidade de Tecnologia de Alimentos e da Saúde Pública, Universidade Federal de Mato Grosso do Sul, Campo Grande, Brazil
| | - Tiago Antônio de Oliveira Mendes
- Departamento de Bioquímica e Biologia molecular, Universidade Federal de Viçosa, Minas Gerais, Brazil; Instituto de Biotecnologia aplicada à Agropecuaria, BIOAGRO-UFV, Viçosa, Minas Gerais, Brazil.
| | - Maria Goreti de Almeida Oliveira
- Departamento de Bioquímica e Biologia molecular, Universidade Federal de Viçosa, Minas Gerais, Brazil; Instituto de Biotecnologia aplicada à Agropecuaria, BIOAGRO-UFV, Viçosa, Minas Gerais, Brazil
| |
Collapse
|
45
|
Yang X, Yang S, Qi H, Wang T, Li H, Zhang Z. PlaPPISite: a comprehensive resource for plant protein-protein interaction sites. BMC PLANT BIOLOGY 2020; 20:61. [PMID: 32028878 PMCID: PMC7006421 DOI: 10.1186/s12870-020-2254-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 01/16/2020] [Indexed: 05/02/2023]
Abstract
BACKGROUND Protein-protein interactions (PPIs) play very important roles in diverse biological processes. Experimentally validated or predicted PPI data have become increasingly available in diverse plant species. To further explore the biological functions of PPIs, understanding the interaction details of plant PPIs (e.g., the 3D structural contexts of interaction sites) is necessary. By integrating bioinformatics algorithms, interaction details can be annotated at different levels and then compiled into user-friendly databases. In our previous study, we developed AraPPISite, which aimed to provide interaction site information for PPIs in the model plant Arabidopsis thaliana. Considering that the application of AraPPISite is limited to one species, it is very natural that AraPPISite should be evolved into a new database that can provide interaction details of PPIs in multiple plants. DESCRIPTION PlaPPISite (http://zzdlab.com/plappisite/index.php) is a comprehensive, high-coverage and interaction details-oriented database for 13 plant interactomes. In addition to collecting 121 experimentally verified structures of protein complexes, the complex structures of experimental/predicted PPIs in the 13 plants were also constructed, and the corresponding interaction sites were annotated. For the PPIs whose 3D structures could not be modelled, the associated domain-domain interactions (DDIs) and domain-motif interactions (DMIs) were inferred. To facilitate the reliability assessment of predicted PPIs, the source species of interolog templates, GO annotations, subcellular localizations and gene expression similarities are also provided. JavaScript packages were employed to visualize structures of protein complexes, protein interaction sites and protein interaction networks. We also developed an online tool for homology modelling and protein interaction site annotation of protein complexes. All data contained in PlaPPISite are also freely available on the Download page. CONCLUSION PlaPPISite provides the plant research community with an easy-to-use and comprehensive data resource for the search and analysis of protein interaction details from the 13 important plant species.
Collapse
Affiliation(s)
- Xiaodi Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| | - Shiping Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| | - Huan Qi
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| | - Tianpeng Wang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| | - Hong Li
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, 570228 China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, 100193 China
| |
Collapse
|
46
|
Kim D, Han SK, Lee K, Kim I, Kong J, Kim S. Evolutionary coupling analysis identifies the impact of disease-associated variants at less-conserved sites. Nucleic Acids Res 2019; 47:e94. [PMID: 31199866 PMCID: PMC6895274 DOI: 10.1093/nar/gkz536] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 05/03/2019] [Accepted: 06/05/2019] [Indexed: 12/20/2022] Open
Abstract
Genome-wide association studies have discovered a large number of genetic variants in human patients with the disease. Thus, predicting the impact of these variants is important for sorting disease-associated variants (DVs) from neutral variants. Current methods to predict the mutational impacts depend on evolutionary conservation at the mutation site, which is determined using homologous sequences and based on the assumption that variants at well-conserved sites have high impacts. However, many DVs at less-conserved but functionally important sites cannot be predicted by the current methods. Here, we present a method to find DVs at less-conserved sites by predicting the mutational impacts using evolutionary coupling analysis. Functionally important and evolutionarily coupled sites often have compensatory variants on cooperative sites to avoid loss of function. We found that our method identified known intolerant variants in a diverse group of proteins. Furthermore, at less-conserved sites, we identified DVs that were not identified using conservation-based methods. These newly identified DVs were frequently found at protein interaction interfaces, where species-specific mutations often alter interaction specificity. This work presents a means to identify less-conserved DVs and provides insight into the relationship between evolutionarily coupled sites and human DVs.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Kwanghwan Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Inhae Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| |
Collapse
|
47
|
The Plasma Factor XIII Heterotetrameric Complex Structure: Unexpected Unequal Pairing within a Symmetric Complex. Biomolecules 2019; 9:biom9120765. [PMID: 31766577 PMCID: PMC6995596 DOI: 10.3390/biom9120765] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 11/15/2019] [Accepted: 11/19/2019] [Indexed: 02/07/2023] Open
Abstract
Factor XIII (FXIII) is a predominant determinant of clot stability, strength, and composition. Plasma FXIII circulates as a pro-transglutaminase with two catalytic A subunits and two carrier-protective B subunits in a heterotetramer (FXIII-A2B2). FXIII-A2 and -B2 subunits are synthesized separately and then assembled in plasma. Following proteolytic activation by thrombin and calcium-mediated dissociation of the B subunits, activated FXIII (FXIIIa) covalently cross links fibrin, promoting clot stability. The zymogen and active states of the FXIII-A subunits have been structurally characterized; however, the structure of FXIII-B subunits and the FXIII-A2B2 complex have remained elusive. Using integrative hybrid approaches including atomic force microscopy, cross-linking mass spectrometry, and computational approaches, we have constructed the first all-atom model of the FXIII-A2B2 complex. We also used molecular dynamics simulations in combination with isothermal titration calorimetry to characterize FXIII-A2B2 assembly, activation, and dissociation. Our data reveal unequal pairing of individual subunit monomers in an otherwise symmetric complex, and suggest this unusual structure is critical for both assembly and activation of this complex. Our findings enhance understanding of mechanisms associating FXIII-A2B2 mutations with disease and have important implications for the rational design of molecules to alter FXIII assembly or activity to reduce bleeding and thrombotic complications.
Collapse
|
48
|
Ozdemir ES, Gursoy A, Keskin O. Analysis of single amino acid variations in singlet hot spots of protein-protein interfaces. Bioinformatics 2019; 34:i795-i801. [PMID: 30423104 DOI: 10.1093/bioinformatics/bty569] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Single amino acid variations (SAVs) in protein-protein interaction (PPI) sites play critical roles in diseases. PPI sites (interfaces) have a small subset of residues called hot spots that contribute significantly to the binding energy, and they may form clusters called hot regions. Singlet hot spots are the single amino acid hot spots outside of the hot regions. The distribution of SAVs on the interface residues may be related to their disease association. Results We performed statistical and structural analyses of SAVs with literature curated experimental thermodynamics data, and demonstrated that SAVs which destabilize PPIs are more likely to be found in singlet hot spots rather than hot regions and energetically less important interface residues. In contrast, non-hot spot residues are significantly enriched in neutral SAVs, which do not affect PPI stability. Surprisingly, we observed that singlet hot spots tend to be enriched in disease-causing SAVs, while benign SAVs significantly occur in non-hot spot residues. Our work demonstrates that SAVs in singlet hot spot residues have significant effect on protein stability and function. Availability and implementation The dataset used in this paper is available as Supplementary Material. The data can be found at http://prism.ccbb.ku.edu.tr/data/sav/ as well. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- E Sila Ozdemir
- Department of Chemical and Biological Engineering, Koc University, Istanbul, Turkey
| | - Attila Gursoy
- Department of Computer Engineering, Koc University, Istanbul, Turkey.,Research Center for Translational Medicine (KUTTAM), Koc University, Istanbul, Turkey
| | - Ozlem Keskin
- Department of Chemical and Biological Engineering, Koc University, Istanbul, Turkey.,Research Center for Translational Medicine (KUTTAM), Koc University, Istanbul, Turkey
| |
Collapse
|
49
|
Chong CS, Kunze M, Hochreiter B, Krenn M, Berger J, Maurer-Stroh S. Rare Human Missense Variants can affect the Function of Disease-Relevant Proteins by Loss and Gain of Peroxisomal Targeting Motifs. Int J Mol Sci 2019; 20:E4609. [PMID: 31533369 PMCID: PMC6770196 DOI: 10.3390/ijms20184609] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 09/06/2019] [Accepted: 09/14/2019] [Indexed: 12/30/2022] Open
Abstract
Single nucleotide variants (SNVs) resulting in amino acid substitutions (i.e., missense variants) can affect protein localization by changing or creating new targeting signals. Here, we studied the potential of naturally occurring SNVs from the Genome Aggregation Database (gnomAD) to result in the loss of an existing peroxisomal targeting signal 1 (PTS1) or gain of a novel PTS1 leading to mistargeting of cytosolic proteins to peroxisomes. Filtering down from 32,985 SNVs resulting in missense mutations within the C-terminal tripeptide of 23,064 human proteins, based on gene annotation data and computational prediction, we selected six SNVs for experimental testing of loss of function (LoF) of the PTS1 motif and five SNVs in cytosolic proteins for gain in PTS1-mediated peroxisome import (GoF). Experimental verification by immunofluorescence microscopy for subcellular localization and FRET affinity measurements for interaction with the receptor PEX5 demonstrated that five of the six predicted LoF SNVs resulted in loss of the PTS1 motif while three of five predicted GoF SNVs resulted in de novo PTS1 generation. Overall, we showed that a complementary approach incorporating bioinformatics methods and experimental testing was successful in identifying SNVs capable of altering peroxisome protein import, which may have implications in human disease.
Collapse
Affiliation(s)
- Cheng-Shoong Chong
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
- National University of Singapore Graduate School for Integrative Sciences and Engineering (NGS), National University of Singapore, Singapore 119077, Singapore.
| | - Markus Kunze
- Medical University of Vienna, Center for Brain Research, Department of Pathobiology of the Nervous System, 1090 Vienna, Austria.
| | - Bernhard Hochreiter
- Medical University of Vienna, Center for Physiology and Pharmacology, Institute for Vascular Biology and Thrombosis Research, 1090 Vienna, Austria.
| | - Martin Krenn
- Department of Neurology, Medical University of Vienna, 1090 Vienna, Austria.
- Institute of Human Genetics, Technical University Munich, 81675 Munich, Germany.
| | - Johannes Berger
- Medical University of Vienna, Center for Brain Research, Department of Pathobiology of the Nervous System, 1090 Vienna, Austria.
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
- National University of Singapore Graduate School for Integrative Sciences and Engineering (NGS), National University of Singapore, Singapore 119077, Singapore.
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore.
- Innovations in Food and Chemical Safety Programme (IFCS), Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore.
| |
Collapse
|
50
|
Dincer C, Kaya T, Keskin O, Gursoy A, Tuncbag N. 3D spatial organization and network-guided comparison of mutation profiles in Glioblastoma reveals similarities across patients. PLoS Comput Biol 2019; 15:e1006789. [PMID: 31527881 PMCID: PMC6782092 DOI: 10.1371/journal.pcbi.1006789] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 10/08/2019] [Accepted: 07/31/2019] [Indexed: 02/06/2023] Open
Abstract
Glioblastoma multiforme (GBM) is the most aggressive type of brain tumor. Molecular heterogeneity is a hallmark of GBM tumors that is a barrier in developing treatment strategies. In this study, we used the nonsynonymous mutations of GBM tumors deposited in The Cancer Genome Atlas (TCGA) and applied a systems level approach based on biophysical characteristics of mutations and their organization in patient-specific subnetworks to reduce inter-patient heterogeneity and to gain potential clinically relevant insights. Approximately 10% of the mutations are located in "patches" which are defined as the set of residues spatially in close proximity that are mutated across multiple patients. Grouping mutations as 3D patches reduces the heterogeneity across patients. There are multiple patches that are relatively small in oncogenes, whereas there are a small number of very large patches in tumor suppressors. Additionally, different patches in the same protein are often located at different domains that can mediate different functions. We stratified the patients into five groups based on their potentially affected pathways that are revealed from the patient-specific subnetworks. These subnetworks were constructed by integrating mutation profiles of the patients with the interactome data. Network-guided clustering showed significant association between the groups and patient survival (P-value = 0.0408). Also, each group carries a set of signature 3D mutation patches that affect predominant pathways. We integrated drug sensitivity data of GBM cell lines with the mutation patches and the patient groups to analyze the possible therapeutic outcome of these patches. We found that Pazopanib might be effective in Group 3 by targeting CSF1R. Additionally, inhibiting ATM that is a mediator of PTEN phosphorylation may be ineffective in Group 2. We believe that from mutations to networks and eventually to clinical and therapeutic data, this study provides a novel perspective in the network-guided precision medicine.
Collapse
Affiliation(s)
- Cansu Dincer
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tugba Kaya
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Ozlem Keskin
- Department of Chemical and Biological Engineering, Koc University, Istanbul, Turkey
- Research Center for Translational Medicine (KUTTAM), Koc University, Istanbul, Turkey
| | - Attila Gursoy
- Research Center for Translational Medicine (KUTTAM), Koc University, Istanbul, Turkey
- Department of Computer Engineering, Koc University, Istanbul, Turkey
| | - Nurcan Tuncbag
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
- Cancer Systems Biology Laboratory (CanSyL-METU), Ankara, Turkey
- * E-mail:
| |
Collapse
|