1
|
Iqbal S, Begum F, Nyamai DW, Jalal N, Shaw P. An Integrated Computational Analysis of High-Risk SNPs in Angiopoietin-like Proteins (ANGPTL3 and ANGPTL8) Reveals Perturbed Protein Dynamics Associated with Cancer. Molecules 2023; 28:4648. [PMID: 37375208 DOI: 10.3390/molecules28124648] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/01/2023] [Accepted: 06/06/2023] [Indexed: 06/29/2023] Open
Abstract
Angiopoietin-like proteins (ANGPTL) constitute a family of eight proteins (1-8) which play a pivotal role in the regulation of various pathophysiological processes. The current study sought to identify high-risk, "non-synonymous, single-nucleotide polymorphisms" (nsSNPs) in both ANGPTL3 and ANGPTL8 to evaluate the role that these nsSNPs play in various types of cancer. We retrieved a total of 301 nsSNPs from various databases; 79 of these candidates constitute high-risk nsSNPs. Moreover, we identified eleven high-risk nsSNPs that cause various types of cancer: seven candidates for ANGPTL3 (L57H, F295L, L309F, K329M, R332L, S348C, and G409R) and four candidates for ANGPTL8 (P23L, R85W, R138S, and E148D). Protein-protein interaction analysis revealed a strong association of ANGPTL proteins with several tumor-suppressor proteins such as ITGB3, ITGAV, and RASSF5. 'Gene-expression profiling interactive analysis' (GEPIA) showed that expression of ANGPTL3 is significantly downregulated in five cancers: sarcoma (SARC); cholangio carcinoma (CHOL); kidney chromophobe carcinoma (KICH); kidney renal clear cell carcinoma (KIRC); and kidney renal papillary cell carcinoma (KIRP). GEPIA also showed that expression of ANGPTL8 remains downregulated in three cancers: CHOL; glioblastoma (GBM); and breast invasive carcinoma (BRCA). Survival rate analysis indicated that both upregulation and downregulation of ANGPTL3 and ANGPTL8 leads to low survival rates in various types of cancer. Overall, the current study revealed that both ANGPTL3 and ANGPTL8 constitute potential prognostic biomarkers for cancer; moreover, nsSNPs in these proteins might lead to the progression of cancer. However, further in vivo investigation will be helpful to validate the role of these proteins in the biology of cancer.
Collapse
Affiliation(s)
- Sajid Iqbal
- Oujiang Laboratory (Zhejiang Laboratory for Regenerative Medicine, Vision and Brain Health), Wenzhou 325000, China
| | - Farida Begum
- Department of Biochemistry, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Dorothy Wavinya Nyamai
- Oujiang Laboratory (Zhejiang Laboratory for Regenerative Medicine, Vision and Brain Health), Wenzhou 325000, China
- Department of Biochemistry, Jomo Kenyatta University of Agriculture and Technology, Nairobi 00200, Kenya
| | - Nasir Jalal
- Oujiang Laboratory (Zhejiang Laboratory for Regenerative Medicine, Vision and Brain Health), Wenzhou 325000, China
| | - Peter Shaw
- Oujiang Laboratory (Zhejiang Laboratory for Regenerative Medicine, Vision and Brain Health), Wenzhou 325000, China
| |
Collapse
|
2
|
Ducich NH, Mears JA, Bedoyan JK. Solvent accessibility of E1α and E1β residues with known missense mutations causing pyruvate dehydrogenase complex (PDC) deficiency: Impact on PDC-E1 structure and function. J Inherit Metab Dis 2022; 45:557-570. [PMID: 35038180 PMCID: PMC9297371 DOI: 10.1002/jimd.12477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Revised: 01/11/2022] [Accepted: 01/12/2022] [Indexed: 11/08/2022]
Abstract
Pyruvate dehydrogenase complex deficiency is a major cause of primary lactic acidemia resulting in high morbidity and mortality, with limited therapeutic options. PDHA1 mutations are responsible for >82% of cases. The E1 component of PDC is a symmetric dimer of heterodimers (αβ/α'β') encoded by PDHA1 and PDHB. We measured solvent accessibility surface area (SASA), utilized nearest-neighbor analysis, incorporated sequence changes using mutagenesis tool in PyMOL, and performed molecular modeling with SWISS-MODEL, to investigate the impact of residues with disease-causing missense variants (DMVs) on E1 structure and function. We reviewed 166 and 13 genetically resolved cases due to PDHA1 and PDHB, respectively, from variant databases. We expanded on 102 E1α and 13 E1β nonduplicate DMVs. DMVs of E1α Arg112-Arg224 stretch (exons 5-7) and of E1α Arg residues constituted 40% and 39% of cases, respectively, with invariant Arg349 accounting for 22% of arginine replacements. SASA analysis showed that 86% and 84% of residues with nonduplicate DMVs of E1α and E1β, respectively, are solvent inaccessible ("buried"). Furthermore, 30% of E1α buried residues with DMVs are deleterious through perturbation of subunit-subunit interface contact (SSIC), with 73% located in the Arg112-Arg224 stretch. E1α Arg349 represented 74% of buried E1α Arg residues involved in SSIC. Structural perturbations resulting from residue replacements in some matched neighboring pairs of amino acids on different subunits involved in SSIC at 2.9-4.0 Å interatomic distance apart, exhibit similar clinical phenotype. Collectively, this work provides insight for future target-based advanced molecular modeling studies, with implications for development of novel therapeutics for specific recurrent DMVs of E1α.
Collapse
Affiliation(s)
- Nicole H. Ducich
- Case Western Reserve University (CWRU) School of Medicine, Cleveland, Ohio, USA
| | - Jason A. Mears
- Department of Pharmacology, CWRU, Cleveland, Ohio, USA
- Center for Mitochondrial Diseases, CWRU, Cleveland, Ohio, USA
| | - Jirair K. Bedoyan
- Division of Genetic and Genomic Medicine, UPMC Children’s Hospital of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
3
|
Ammar A, Cavill R, Evelo C, Willighagen E. PSnpBind: a database of mutated binding site protein-ligand complexes constructed using a multithreaded virtual screening workflow. J Cheminform 2022; 14:8. [PMID: 35227289 PMCID: PMC8886843 DOI: 10.1186/s13321-021-00573-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 11/18/2021] [Indexed: 11/15/2022] Open
Abstract
A key concept in drug design is how natural variants, especially the ones occurring in the binding site of drug targets, affect the inter-individual drug response and efficacy by altering binding affinity. These effects have been studied on very limited and small datasets while, ideally, a large dataset of binding affinity changes due to binding site single-nucleotide polymorphisms (SNPs) is needed for evaluation. However, to the best of our knowledge, such a dataset does not exist. Thus, a reference dataset of ligands binding affinities to proteins with all their reported binding sites' variants was constructed using a molecular docking approach. Having a large database of protein-ligand complexes covering a wide range of binding pocket mutations and a large small molecules' landscape is of great importance for several types of studies. For example, developing machine learning algorithms to predict protein-ligand affinity or a SNP effect on it requires an extensive amount of data. In this work, we present PSnpBind: A large database of 0.6 million mutated binding site protein-ligand complexes constructed using a multithreaded virtual screening workflow. It provides a web interface to explore and visualize the protein-ligand complexes and a REST API to programmatically access the different aspects of the database contents. PSnpBind is open source and freely available at https://psnpbind.org .
Collapse
Affiliation(s)
- Ammar Ammar
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Rachel Cavill
- Department of Data Science and Knowledge Engineering, Maastricht University, Maastricht, The Netherlands
| | - Chris Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Egon Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
4
|
Vazquez M, Pons T. Annotating Cancer-Related Variants at Protein-Protein Interface with Structure-PPi. Methods Mol Biol 2022; 2493:315-330. [PMID: 35751824 DOI: 10.1007/978-1-0716-2293-3_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A comprehensive analysis of germline and somatic variants requires complex computational approaches that combine next-generation sequencing (NGS)-based omics data with curated annotations from public repositories. Here, we describe Structure-PPi, which facilitates the analysis of cancer-related variants onto protein 3D structures, interaction interfaces, and other important functional sites (i.e., catalytic, ligand-binding, posttranslational modification). Our approach relies on features extracted from Interactome3D, UniProtKB, InterPro, APPRIS, dbNSFP, and COSMIC databases and provides complementary information to pathogenicity prediction methods. Thus, Structure-PPi helps in the discrimination of false-positive predictions and adds both mechanistic and biological insights into the role of variants in a given cancer. An online version of the tools is available at https://rbbt.bsc.es/StructurePPI/ .
Collapse
Affiliation(s)
- Miguel Vazquez
- Genome Informatics Unit, Barcelona Supercomputing Center (BSC-CNS), Barcelona, Spain.
| | - Tirso Pons
- Department of Immunology and Oncology, National Center for Biotechnology, Spanish National Research Council (CNB-CSIC), Madrid, Spain
| |
Collapse
|
5
|
BEHZADI PAYAM, GAJDÁCS MÁRIÓ. Worldwide Protein Data Bank (wwPDB): A virtual treasure for research in biotechnology. Eur J Microbiol Immunol (Bp) 2021; 11:77-86. [PMID: 34908533 PMCID: PMC8830413 DOI: 10.1556/1886.2021.00020] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 11/23/2021] [Indexed: 12/25/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RSCB PDB) provides a wide range of digital data regarding biology and biomedicine. This huge internet resource involves a wide range of important biological data, obtained from experiments around the globe by different scientists. The Worldwide Protein Data Bank (wwPDB) represents a brilliant collection of 3D structure data associated with important and vital biomolecules including nucleic acids (RNAs and DNAs) and proteins. Moreover, this database accumulates knowledge regarding function and evolution of biomacromolecules which supports different disciplines such as biotechnology. 3D structure, functional characteristics and phylogenetic properties of biomacromolecules give a deep understanding of the biomolecules' characteristics. An important advantage of the wwPDB database is the data updating time, which is done every week. This updating process helps users to have the newest data and information for their projects. The data and information in wwPDB can be a great support to have an accurate imagination and illustrations of the biomacromolecules in biotechnology. As demonstrated by the SARS-CoV-2 pandemic, rapidly reliable and accessible biological data for microbiology, immunology, vaccinology, and drug development are critical to address many healthcare-related challenges that are facing humanity. The aim of this paper is to introduce the readers to wwPDB, and to highlight the importance of this database in biotechnology, with the expectation that the number of scientists interested in the utilization of Protein Data Bank's resources will increase substantially in the coming years.
Collapse
Affiliation(s)
- PAYAM BEHZADI
- Department of Microbiology, College of Basic Sciences, Shahr-e-Qods Branch, Islamic Azad University, Tehran, 37541-374, Iran
| | - MÁRIÓ GAJDÁCS
- Department of Oral Biology and Experimental Dental Research, Faculty of Dentistry, University of Szeged, 6720, Szeged, Hungary,*Corresponding author. Tel.: +36-62-342-532. E-mail:
| |
Collapse
|
6
|
Pathogenic missense protein variants affect different functional pathways and proteomic features than healthy population variants. PLoS Biol 2021; 19:e3001207. [PMID: 33909605 PMCID: PMC8110273 DOI: 10.1371/journal.pbio.3001207] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 05/10/2021] [Accepted: 03/26/2021] [Indexed: 12/27/2022] Open
Abstract
Missense variants are present amongst the healthy population, but some of them are causative of human diseases. A classification of variants associated with “healthy” or “diseased” states is therefore not always straightforward. A deeper understanding of the nature of missense variants in health and disease, the cellular processes they may affect, and the general molecular principles which underlie these differences is essential to offer mechanistic explanations of the true impact of pathogenic variants. Here, we have formalised a statistical framework which enables robust probabilistic quantification of variant enrichment across full-length proteins, their domains, and 3D structure-defined regions. Using this framework, we validate and extend previously reported trends of variant enrichment in different protein structural regions (surface/core/interface). By examining the association of variant enrichment with available functional pathways and transcriptomic and proteomic (protein half-life, thermal stability, abundance) data, we have mined a rich set of molecular features which distinguish between pathogenic and population variants: Pathogenic variants mainly affect proteins involved in cell proliferation and nucleotide processing and are enriched in more abundant proteins. Additionally, rare population variants display features closer to common than pathogenic variants. We validate the association between these molecular features and variant pathogenicity by comparing against existing in silico variant impact annotations. This study provides molecular details into how different proteins exhibit resilience and/or sensitivity towards missense variants and provides the rationale to prioritise variant-enriched proteins and protein domains for therapeutic targeting and development. The ZoomVar database, which we created for this study, is available at fraternalilab.kcl.ac.uk/ZoomVar. It allows users to programmatically annotate missense variants with protein structural information and to calculate variant enrichment in different protein structural regions. How do can one improve the classification of genetic variants as harmful or harmless? This study uses a robust statistical analysis to exploit the interplay between protein structure, proteomic measurements and functional pathways to enable better discrimination between missense variants in health and disease.
Collapse
|
7
|
Lee J, Lee D, Lee KH. Literature mining for context-specific molecular relations using multimodal representations (COMMODAR). BMC Bioinformatics 2020; 21:250. [PMID: 33106154 PMCID: PMC7586695 DOI: 10.1186/s12859-020-3396-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 02/06/2020] [Indexed: 01/14/2023] Open
Abstract
Biological contextual information helps understand various phenomena occurring in the biological systems consisting of complex molecular relations. The construction of context-specific relational resources vastly relies on laborious manual extraction from unstructured literature. In this paper, we propose COMMODAR, a machine learning-based literature mining framework for context-specific molecular relations using multimodal representations. The main idea of COMMODAR is the feature augmentation by the cooperation of multimodal representations for relation extraction. We leveraged biomedical domain knowledge as well as canonical linguistic information for more comprehensive representations of textual sources. The models based on multiple modalities outperformed those solely based on the linguistic modality. We applied COMMODAR to the 14 million PubMed abstracts and extracted 9214 context-specific molecular relations. All corpora, extracted data, evaluation results, and the implementation code are downloadable at https://github.com/jae-hyun-lee/commodar . CCS CONCEPTS: • Computing methodologies~Information extraction • Computing methodologies~Neural networks • Applied computing~Biological networks.
Collapse
Affiliation(s)
- Jaehyun Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Doheon Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea. .,Bio-Synergy Research Center, Daejeon, South Korea.
| | - Kwang Hyung Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
| |
Collapse
|
8
|
Ozdemir ES, Gursoy A, Keskin O. Analysis of single amino acid variations in singlet hot spots of protein-protein interfaces. Bioinformatics 2019; 34:i795-i801. [PMID: 30423104 DOI: 10.1093/bioinformatics/bty569] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Single amino acid variations (SAVs) in protein-protein interaction (PPI) sites play critical roles in diseases. PPI sites (interfaces) have a small subset of residues called hot spots that contribute significantly to the binding energy, and they may form clusters called hot regions. Singlet hot spots are the single amino acid hot spots outside of the hot regions. The distribution of SAVs on the interface residues may be related to their disease association. Results We performed statistical and structural analyses of SAVs with literature curated experimental thermodynamics data, and demonstrated that SAVs which destabilize PPIs are more likely to be found in singlet hot spots rather than hot regions and energetically less important interface residues. In contrast, non-hot spot residues are significantly enriched in neutral SAVs, which do not affect PPI stability. Surprisingly, we observed that singlet hot spots tend to be enriched in disease-causing SAVs, while benign SAVs significantly occur in non-hot spot residues. Our work demonstrates that SAVs in singlet hot spot residues have significant effect on protein stability and function. Availability and implementation The dataset used in this paper is available as Supplementary Material. The data can be found at http://prism.ccbb.ku.edu.tr/data/sav/ as well. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- E Sila Ozdemir
- Department of Chemical and Biological Engineering, Koc University, Istanbul, Turkey
| | - Attila Gursoy
- Department of Computer Engineering, Koc University, Istanbul, Turkey.,Research Center for Translational Medicine (KUTTAM), Koc University, Istanbul, Turkey
| | - Ozlem Keskin
- Department of Chemical and Biological Engineering, Koc University, Istanbul, Turkey.,Research Center for Translational Medicine (KUTTAM), Koc University, Istanbul, Turkey
| |
Collapse
|
9
|
Konc J, Skrlj B, Erzen N, Kunej T, Janezic D. GenProBiS: web server for mapping of sequence variants to protein binding sites. Nucleic Acids Res 2019; 45:W253-W259. [PMID: 28498966 PMCID: PMC5570222 DOI: 10.1093/nar/gkx420] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 05/02/2017] [Indexed: 02/02/2023] Open
Abstract
Discovery of potentially deleterious sequence variants is important and has wide implications for research and generation of new hypotheses in human and veterinary medicine, and drug discovery. The GenProBiS web server maps sequence variants to protein structures from the Protein Data Bank (PDB), and further to protein–protein, protein–nucleic acid, protein–compound, and protein–metal ion binding sites. The concept of a protein–compound binding site is understood in the broadest sense, which includes glycosylation and other post-translational modification sites. Binding sites were defined by local structural comparisons of whole protein structures using the Protein Binding Sites (ProBiS) algorithm and transposition of ligands from the similar binding sites found to the query protein using the ProBiS-ligands approach with new improvements introduced in GenProBiS. Binding site surfaces were generated as three-dimensional grids encompassing the space occupied by predicted ligands. The server allows intuitive visual exploration of comprehensively mapped variants, such as human somatic mis-sense mutations related to cancer and non-synonymous single nucleotide polymorphisms from 21 species, within the predicted binding sites regions for about 80 000 PDB protein structures using fast WebGL graphics. The GenProBiS web server is open and free to all users at http://genprobis.insilab.org.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia.,University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, 6000 Koper, Slovenia
| | - Blaz Skrlj
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | - Nika Erzen
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | - Tanja Kunej
- Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia
| | - Dusanka Janezic
- University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, 6000 Koper, Slovenia
| |
Collapse
|
10
|
Arora A, Somasundaram K. Targeted Proteomics Comes to the Benchside and the Bedside: Is it Ready for Us? Bioessays 2019; 41:e1800042. [PMID: 30734933 DOI: 10.1002/bies.201800042] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 11/28/2018] [Indexed: 12/22/2022]
Abstract
While mass spectrometry (MS)-based quantification of small molecules has been successfully used for decades, targeted MS has only recently been used by the proteomics community to investigate clinical questions such as biomarker verification and validation. Targeted MS holds the promise of a paradigm shift in the quantitative determination of proteins. Nevertheless, targeted quantitative proteomics requires improvisation in making sample processing, instruments, and data analysis more accessible. In the backdrop of the genomic era reaching its zenith, certain questions arise: is the proteomic era about to come? If we are at the beginning of a new future for protein quantification, are we prepared to incorporate targeted proteomics at the benchside for basic research and at the bedside for the good of patients? Here, an overview of the knowledge required to perform targeted proteomics as well as its applications is provided. A special emphasis is placed on upcoming areas such as peptidomics, proteoform research, and mass spectrometry imaging, where the utilization of targeted proteomics is expected to bring forth new avenues. The limitations associated with the acceptance of this technique for mainstream usage are also highlighted. Also see the video abstract here https://youtu.be/mieB47B8gZw.
Collapse
Affiliation(s)
- Anjali Arora
- Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore, 560012, India
| | - Kumaravel Somasundaram
- Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore, 560012, India
| |
Collapse
|
11
|
Dib L, Salamin N, Gfeller D. Polymorphic sites preferentially avoid co-evolving residues in MHC class I proteins. PLoS Comput Biol 2018; 14:e1006188. [PMID: 29782520 PMCID: PMC5983860 DOI: 10.1371/journal.pcbi.1006188] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 06/01/2018] [Accepted: 05/09/2018] [Indexed: 01/11/2023] Open
Abstract
Major histocompatibility complex class I (MHC-I) molecules are critical to adaptive immune defence mechanisms in vertebrate species and are encoded by highly polymorphic genes. Polymorphic sites are located close to the ligand-binding groove and entail MHC-I alleles with distinct binding specificities. Some efforts have been made to investigate the relationship between polymorphism and protein stability. However, less is known about the relationship between polymorphism and MHC-I co-evolutionary constraints. Using Direct Coupling Analysis (DCA) we found that co-evolution analysis accurately pinpoints structural contacts, although the protein family is restricted to vertebrates and comprises less than five hundred species, and that the co-evolutionary signal is mainly driven by inter-species changes, and not intra-species polymorphism. Moreover, we show that polymorphic sites in human preferentially avoid co-evolving residues, as well as residues involved in protein stability. These results suggest that sites displaying high polymorphism may have been selected during vertebrates’ evolution to avoid co-evolutionary constraints and thereby maximize their mutability. Amino acid co-evolution represents cases of simultaneous substitution of amino acids at distinct positions in protein sequences. In the MHC-I protein family, such co-evolution could result from either amino acid changes across species or changes within species due to the high polymorphism of MHC-I molecules. Here we show that signals captured by global methods such as Direct Coupling Analysis (DCA) to estimate co-evolution primarily result from changes across species. Moreover, our results indicate that polymorphic sites in MHC-I molecules tend to be decoupled from co-evolving ones. This could suggest that they have been selected to maximize their mutability, which is known to be functionally important to entail MHC-I molecules with a wide repertoire of binding specificities for antigen presentation.
Collapse
Affiliation(s)
- Linda Dib
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Switzerland
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
| | - Nicolas Salamin
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Switzerland
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
12
|
Fraternali F. [Protein-protein interacting networks, their structures and disease-related mutations]. Biol Aujourdhui 2018; 211:223-228. [PMID: 29412132 DOI: 10.1051/jbio/2017031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Indexed: 11/14/2022]
Abstract
In recent years, the comparison of protein interactomes has identified conserved modules, that could represent functional nuclei with a common ancestry. Within this context, recent analyses of protein-protein interacting networks have led to a debate on the influence of the experimental method on the quality and biological pertinence of these data. It is crucial to understand the measure in which divergence between networks of different species reflect sampling biases in respective experimental methods, as opposed to topological features dictated by biological functionality. This aspect requires novel, precise and practical mathematical tools, to quantify and compare high resolution networks. To this end, we have studied the relationship between pools of aleatory graphs and real biological signalization networks, while stressing the number of graph cycles in the networks, which represent complexes in experimental protein interactomes. By combining methods for graph and algorithm dynamics to count the loops, we evaluate the relative importance of the loops in biological networks in comparison with network analyses.
Collapse
Affiliation(s)
- Franca Fraternali
- Randall Division of Cellular and Molecular Biology, King's College, London, UK
| |
Collapse
|
13
|
Laddach A, Ng JCF, Chung SS, Fraternali F. Genetic variants and protein-protein interactions: a multidimensional network-centric view. Curr Opin Struct Biol 2018; 50:82-90. [PMID: 29306755 DOI: 10.1016/j.sbi.2017.12.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 12/19/2017] [Accepted: 12/20/2017] [Indexed: 01/18/2023]
Abstract
We review recent progress in the mapping of genetic variants to proteins, in the context of their interactions, as measured from experiments and/or computational predictions. Such variants can impact on the molecular mechanisms underlying an interaction and its stability. We highlight recent work which relies on the effective use of protein-protein interaction networks (PPINs), integrated with 3D structural information, for evaluating disease-associated variants. Furthermore, we discuss how the integration of multiple layers of biological information, in the context of PPINs, can improve the interpretation of genetic variants and inspire new therapeutic strategies.
Collapse
Affiliation(s)
- Anna Laddach
- Randall Division of Cell and Molecular Biophysics, King's College London, UK
| | - Joseph Chi-Fung Ng
- Randall Division of Cell and Molecular Biophysics, King's College London, UK
| | - Sun Sook Chung
- Randall Division of Cell and Molecular Biophysics, King's College London, UK; Department of Haematological Medicine, King's College London, UK
| | - Franca Fraternali
- Randall Division of Cell and Molecular Biophysics, King's College London, UK.
| |
Collapse
|
14
|
Brown DK, Tastan Bishop Ö. HUMA: A platform for the analysis of genetic variation in humans. Hum Mutat 2018; 39:40-51. [PMID: 28967693 PMCID: PMC5722678 DOI: 10.1002/humu.23334] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Revised: 08/01/2017] [Accepted: 08/17/2017] [Indexed: 11/06/2022]
Abstract
The completion of the human genome project at the beginning of the 21st century, along with the rapid advancement of sequencing technologies thereafter, has resulted in exponential growth of biological data. In genetics, this has given rise to numerous variation databases, created to store and annotate the ever-expanding dataset of known mutations. Usually, these databases focus on variation at the sequence level. Few databases focus on the analysis of variation at the 3D level, that is, mapping, visualizing, and determining the effects of variation in protein structures. Additionally, these Web servers seldom incorporate tools to help analyze these data. Here, we present the Human Mutation Analysis (HUMA) Web server and database. HUMA integrates sequence, structure, variation, and disease data into a single, connected database. A user-friendly interface provides click-based data access and visualization, whereas a RESTful Web API provides programmatic access to the data. Tools have been integrated into HUMA to allow initial analyses to be carried out on the server. Furthermore, users can upload their private variation datasets, which are automatically mapped to public data and can be analyzed using the integrated tools. HUMA is freely accessible at https://huma.rubi.ru.ac.za.
Collapse
Affiliation(s)
- David K Brown
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa
| |
Collapse
|
15
|
Spatial distribution of disease-associated variants in three-dimensional structures of protein complexes. Oncogenesis 2017; 6:e380. [PMID: 28945216 PMCID: PMC5623905 DOI: 10.1038/oncsis.2017.79] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 07/26/2017] [Accepted: 08/06/2017] [Indexed: 01/06/2023] Open
Abstract
Next-generation sequencing enables simultaneous analysis of hundreds of human genomes
associated with a particular phenotype, for example, a disease. These genomes
naturally contain a lot of sequence variation that ranges from single-nucleotide
variants (SNVs) to large-scale structural rearrangements. In order to establish a
functional connection between genotype and disease-associated phenotypes, one needs
to distinguish disease drivers from neutral passenger variants. Functional annotation
based on experimental assays is feasible only for a limited number of candidate
mutations. Thus alternative computational tools are needed. A possible approach to
annotating mutations functionally is to consider their spatial location relative to
functionally relevant sites in three-dimensional (3D) structures of the harboring
proteins. This is impeded by the lack of available protein 3D structures.
Complementing experimentally resolved structures with reliable computational models
is an attractive alternative. We developed a structure-based approach to
characterizing comprehensive sets of non-synonymous single-nucleotide variants
(nsSNVs): associated with cancer, non-cancer diseases and putatively functionally
neutral. We searched experimentally resolved protein 3D structures for potential
homology-modeling templates for proteins harboring corresponding mutations. We found
such templates for all proteins with disease-associated nsSNVs, and 51 and 66%
of proteins carrying common polymorphisms and annotated benign variants. Many
mutations caused by nsSNVs can be found in protein–protein,
protein–nucleic acid or protein–ligand complexes. Correction for the
number of available templates per protein reveals that protein–protein
interaction interfaces are not enriched in either cancer nsSNVs, or nsSNVs associated
with non-cancer diseases. Whereas cancer-associated mutations are enriched in
DNA-binding proteins, they are rarely located directly in DNA-interacting interfaces.
In contrast, mutations associated with non-cancer diseases are in general rare in
DNA-binding proteins, but enriched in DNA-interacting interfaces in these proteins.
All disease-associated nsSNVs are overrepresented in ligand-binding pockets, and
nsSNVs associated with non-cancer diseases are additionally enriched in protein core,
where they probably affect overall protein stability.
Collapse
|
16
|
Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human health. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2017; 128:3-13. [PMID: 27913149 DOI: 10.1016/j.pbiomolbio.2016.10.002] [Citation(s) in RCA: 115] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Revised: 10/15/2016] [Accepted: 10/19/2016] [Indexed: 12/22/2022]
|
17
|
Brown DK, Tastan Bishop Ö. Role of Structural Bioinformatics in Drug Discovery by Computational SNP Analysis: Analyzing Variation at the Protein Level. Glob Heart 2017; 12:151-161. [PMID: 28302551 DOI: 10.1016/j.gheart.2017.01.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 01/13/2017] [Indexed: 10/20/2022] Open
Abstract
With the completion of the human genome project at the beginning of the 21st century, the biological sciences entered an unprecedented age of data generation, and made its first steps toward an era of personalized medicine. This abundance of sequence data has led to the proliferation of numerous sequence-based techniques for associating variation with disease, such as genome-wide association studies and candidate gene association studies. However, these statistical methods do not provide an understanding of the functional effects of variation. Structure-based drug discovery and design is increasingly incorporating structural bioinformatics techniques to model and analyze protein targets, perform large scale virtual screening to identify hit to lead compounds, and simulate molecular interactions. These techniques are fast, cost-effective, and complement existing experimental techniques such as high throughput sequencing. In this paper, we discuss the contributions of structural bioinformatics to drug discovery, focusing particularly on the analysis of nonsynonymous single nucleotide polymorphisms. We conclude by suggesting a protocol for future analyses of the structural effects of nonsynonymous single nucleotide polymorphisms on proteins and protein complexes.
Collapse
Affiliation(s)
- David K Brown
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa.
| |
Collapse
|
18
|
Awan FM, Obaid A, Ikram A, Janjua HA. Mutation-Structure-Function Relationship Based Integrated Strategy Reveals the Potential Impact of Deleterious Missense Mutations in Autophagy Related Proteins on Hepatocellular Carcinoma (HCC): A Comprehensive Informatics Approach. Int J Mol Sci 2017; 18:139. [PMID: 28085066 PMCID: PMC5297772 DOI: 10.3390/ijms18010139] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2016] [Revised: 11/14/2016] [Accepted: 11/16/2016] [Indexed: 12/13/2022] Open
Abstract
Autophagy, an evolutionary conserved multifaceted lysosome-mediated bulk degradation system, plays a vital role in liver pathologies including hepatocellular carcinoma (HCC). Post-translational modifications (PTMs) and genetic variations in autophagy components have emerged as significant determinants of autophagy related proteins. Identification of a comprehensive spectrum of genetic variations and PTMs of autophagy related proteins and their impact at molecular level will greatly expand our understanding of autophagy based regulation. In this study, we attempted to identify high risk missense mutations that are highly damaging to the structure as well as function of autophagy related proteins including LC3A, LC3B, BECN1 and SCD1. Number of putative structural and functional residues, including several sites that undergo PTMs were also identified. In total, 16 high-risk SNPs in LC3A, 18 in LC3B, 40 in BECN1 and 43 in SCD1 were prioritized. Out of these, 2 in LC3A (K49A, K51A), 1 in LC3B (S92C), 6 in BECN1 (S113R, R292C, R292H, Y338C, S346Y, Y352H) and 6 in SCD1 (Y41C, Y55D, R131W, R135Q, R135W, Y151C) coincide with potential PTM sites. Our integrated analysis found LC3B Y113C, BECN1 I403T, SCD1 R126S and SCD1 Y218C as highly deleterious HCC-associated mutations. This study is the first extensive in silico mutational analysis of the LC3A, LC3B, BECN1 and SCD1 proteins. We hope that the observed results will be a valuable resource for in-depth mechanistic insight into future investigations of pathological missense SNPs using an integrated computational platform.
Collapse
Affiliation(s)
- Faryal Mehwish Awan
- Department of Industrial Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12 Islamabad 44000, Pakistan.
| | - Ayesha Obaid
- Department of Industrial Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12 Islamabad 44000, Pakistan.
| | - Aqsa Ikram
- Department of Industrial Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12 Islamabad 44000, Pakistan.
| | - Hussnain Ahmed Janjua
- Department of Industrial Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), H-12 Islamabad 44000, Pakistan.
| |
Collapse
|