1
|
Nguyen TN, Ingle C, Thompson S, Reynolds KA. The genetic landscape of a metabolic interaction. Nat Commun 2024; 15:3351. [PMID: 38637543 PMCID: PMC11026382 DOI: 10.1038/s41467-024-47671-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 04/09/2024] [Indexed: 04/20/2024] Open
Abstract
While much prior work has explored the constraints on protein sequence and evolution induced by physical protein-protein interactions, the sequence-level constraints emerging from non-binding functional interactions in metabolism remain unclear. To quantify how variation in the activity of one enzyme constrains the biochemical parameters and sequence of another, we focus on dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS), a pair of enzymes catalyzing consecutive reactions in folate metabolism. We use deep mutational scanning to quantify the growth rate effect of 2696 DHFR single mutations in 3 TYMS backgrounds under conditions selected to emphasize biochemical epistasis. Our data are well-described by a relatively simple enzyme velocity to growth rate model that quantifies how metabolic context tunes enzyme mutational tolerance. Together our results reveal the structural distribution of epistasis in a metabolic enzyme and establish a foundation for the design of multi-enzyme systems.
Collapse
Affiliation(s)
- Thuy N Nguyen
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Form Bio, Dallas, TX, 75226, USA
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Samuel Thompson
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, 94158, USA
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Kimberly A Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
2
|
Liu W, Wang Z, You R, Xie C, Wei H, Xiong Y, Yang J, Zhu S. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 2024; 15:2775. [PMID: 38555371 PMCID: PMC10981738 DOI: 10.1038/s41467-024-46808-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 03/08/2024] [Indexed: 04/02/2024] Open
Abstract
Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Protein Language Model), a homologous protein search method with only sequences as input. PLMSearch uses deep representations from a pre-trained protein language model and trains the similarity prediction model with a large number of real structure similarity. This enables PLMSearch to capture the remote homology information concealed behind the sequences. Extensive experimental results show that PLMSearch can search millions of query-target protein pairs in seconds like MMseqs2 while increasing the sensitivity by more than threefold, and is comparable to state-of-the-art structure search methods. In particular, unlike traditional sequence search methods, PLMSearch can recall most remote homology pairs with dissimilar sequences but similar structures. PLMSearch is freely available at https://dmiip.sjtu.edu.cn/PLMSearch .
Collapse
Affiliation(s)
- Wei Liu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ziye Wang
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ronghui You
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Chenghan Xie
- School of Mathematical Sciences, Fudan University, 200433, Shanghai, China
| | - Hong Wei
- School of Mathematical Sciences, Nankai University, 300071, Tianjin, China
| | - Yi Xiong
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Jianyi Yang
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Science, Shandong University, 266237, Qingdao, China.
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China.
- Shanghai Qi Zhi Institute, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai, China.
- Zhangjiang Fudan International Innovation Center, Shanghai, China.
| |
Collapse
|
3
|
Liu F, Yuan C, Chen H, Yang F. Prediction of linear B-cell epitopes based on protein sequence features and BERT embeddings. Sci Rep 2024; 14:2464. [PMID: 38291341 PMCID: PMC10828400 DOI: 10.1038/s41598-024-53028-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/26/2024] [Indexed: 02/01/2024] Open
Abstract
Linear B-cell epitopes (BCEs) play a key role in the development of peptide vaccines and immunodiagnostic reagents. Therefore, the accurate identification of linear BCEs is of great importance in the prevention of infectious diseases and the diagnosis of related diseases. The experimental methods used to identify BCEs are both expensive and time-consuming and they do not meet the demand for identification of large-scale protein sequence data. As a result, there is a need to develop an efficient and accurate computational method to rapidly identify linear BCE sequences. In this work, we developed the new linear BCE prediction method LBCE-BERT. This method is based on peptide chain sequence information and natural language model BERT embedding information, using an XGBoost classifier. The models were trained on three benchmark datasets. The model was training on three benchmark datasets for hyperparameter selection and was subsequently evaluated on several test datasets. The result indicate that our proposed method outperforms others in terms of AUROC and accuracy. The LBCE-BERT model is publicly available at: https://github.com/Lfang111/LBCE-BERT .
Collapse
Affiliation(s)
- Fang Liu
- School of Humanistic Medicine, Anhui Medical University, Hefei, 230032, Anhui, China
| | - ChengCheng Yuan
- School of Biomedical Engineering, Anhui Medical University, Hefei, 230030, Anhui, China
| | - Haoqiang Chen
- School of Humanistic Medicine, Anhui Medical University, Hefei, 230032, Anhui, China
| | - Fei Yang
- School of Biomedical Engineering, Anhui Medical University, Hefei, 230030, Anhui, China.
| |
Collapse
|
4
|
Cao W, Wu LY, Xia XY, Chen X, Wang ZX, Pan XM. A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins. Sci Rep 2023; 13:20304. [PMID: 37985846 PMCID: PMC10662474 DOI: 10.1038/s41598-023-47496-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023] Open
Abstract
Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.
Collapse
Affiliation(s)
- Wei Cao
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Lu-Yun Wu
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xia-Yu Xia
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xiang Chen
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Zhi-Xin Wang
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| | - Xian-Ming Pan
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
5
|
Tripathi S, Shirnekhi HK, Gorman SD, Chandra B, Baggett DW, Park CG, Somjee R, Lang B, Hosseini SMH, Pioso BJ, Li Y, Iacobucci I, Gao Q, Edmonson MN, Rice SV, Zhou X, Bollinger J, Mitrea DM, White MR, McGrail DJ, Jarosz DF, Yi SS, Babu MM, Mullighan CG, Zhang J, Sahni N, Kriwacki RW. Defining the condensate landscape of fusion oncoproteins. Nat Commun 2023; 14:6008. [PMID: 37770423 PMCID: PMC10539325 DOI: 10.1038/s41467-023-41655-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 09/13/2023] [Indexed: 09/30/2023] Open
Abstract
Fusion oncoproteins (FOs) arise from chromosomal translocations in ~17% of cancers and are often oncogenic drivers. Although some FOs can promote oncogenesis by undergoing liquid-liquid phase separation (LLPS) to form aberrant biomolecular condensates, the generality of this phenomenon is unknown. We explored this question by testing 166 FOs in HeLa cells and found that 58% formed condensates. The condensate-forming FOs displayed physicochemical features distinct from those of condensate-negative FOs and segregated into distinct feature-based groups that aligned with their sub-cellular localization and biological function. Using Machine Learning, we developed a predictor of FO condensation behavior, and discovered that 67% of ~3000 additional FOs likely form condensates, with 35% of those predicted to function by altering gene expression. 47% of the predicted condensate-negative FOs were associated with cell signaling functions, suggesting a functional dichotomy between condensate-positive and -negative FOs. Our Datasets and reagents are rich resources to interrogate FO condensation in the future.
Collapse
Affiliation(s)
- Swarnendu Tripathi
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Hazheen K Shirnekhi
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Scott D Gorman
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Arrakis Therapeutics, 830 Winter St, Waltham, MA, 02451, USA
| | - Bappaditya Chandra
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - David W Baggett
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Cheon-Gil Park
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Ramiz Somjee
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Rhodes College, Memphis, TN, USA
- Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, MO, 63110, USA
| | - Benjamin Lang
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Center of Excellence for Data-Driven Discovery, Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Seyed Mohammad Hadi Hosseini
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Center of Excellence for Data-Driven Discovery, Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Brittany J Pioso
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yongsheng Li
- Livestrong Cancer Institutes, Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Ilaria Iacobucci
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Qingsong Gao
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Michael N Edmonson
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Stephen V Rice
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Xin Zhou
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - John Bollinger
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Diana M Mitrea
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Dewpoint Therapeutics, 451 D Street, Suite 104, Boston, MA, 02210, USA
| | - Michael R White
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- IDEXX Laboratories, Inc., One IDEXX Drive, Westbrook, ME, 04092, USA
| | - Daniel J McGrail
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Daniel F Jarosz
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - S Stephen Yi
- Livestrong Cancer Institutes, Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, 78712, USA
- Department of Biomedical Engineering, and Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, USA
| | - M Madan Babu
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Center of Excellence for Data-Driven Discovery, Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Charles G Mullighan
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jinghui Zhang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Richard W Kriwacki
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA.
- Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Sciences Center, Memphis, TN, USA.
| |
Collapse
|
6
|
Sidorczuk K, Mackiewicz P, Pietluch F, Gagat P. Characterization of signal and transit peptides based on motif composition and taxon-specific patterns. Sci Rep 2023; 13:15751. [PMID: 37735485 PMCID: PMC10514287 DOI: 10.1038/s41598-023-42987-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 09/17/2023] [Indexed: 09/23/2023] Open
Abstract
Targeting peptides or presequences are N-terminal extensions of proteins that encode information about their cellular localization. They include signal peptides (SP), which target proteins to the endoplasmic reticulum, and transit peptides (TP) directing proteins to the organelles of endosymbiotic origin: chloroplasts and mitochondria. TPs were hypothesized to have evolved from antimicrobial peptides (AMPs), which are responsible for the host defence against microorganisms, including bacteria, fungi and viruses. In this study, we performed comprehensive bioinformatic analyses of amino acid motifs of targeting peptides and AMPs using a curated set of experimentally verified proteins. We identified motifs frequently occurring in each type of presequence showing specific patterns associated with their amino acid composition, and investigated their position within the presequence. We also compared motif patterns among different taxonomic groups and identified taxon-specific features, providing some evolutionary insights. Considering the functional relevance and many practical applications of targeting peptides and AMPs, we believe that our analyses will prove useful for their design, and better understanding of protein import mechanism and presequence evolution.
Collapse
Affiliation(s)
- Katarzyna Sidorczuk
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Filip Pietluch
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Przemysław Gagat
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland.
| |
Collapse
|
7
|
da Silva Dambroz CM, Aono AH, de Andrade Silva EM, Pereira WA. Genome-wide analysis and characterization of the LRR-RLK gene family provides insights into anthracnose resistance in common bean. Sci Rep 2023; 13:13455. [PMID: 37596307 PMCID: PMC10439169 DOI: 10.1038/s41598-023-40054-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 08/03/2023] [Indexed: 08/20/2023] Open
Abstract
Anthracnose, caused by the hemibiotrophic fungus Colletotrichum lindemuthianum, is a damaging disease of common beans that can drastically reduce crop yield. The most effective strategy to manage anthracnose is the use of resistant cultivars. There are many resistance loci that have been identified, mapped and associated with markers in common bean chromosomes. The Leucine-rich repeat kinase receptor protein (LRR-RLK) family is a diverse group of transmembrane receptors, which potentially recognizes pathogen-associated molecular patterns and activates an immune response. In this study, we performed in silico analyses to identify, classify, and characterize common bean LRR-RLKs, also evaluating their expression profile in response to the infection by C. lindemuthianum. By analyzing the entire genome of Phaseolus vulgaris, we could identify and classify 230 LRR-RLKs into 15 different subfamilies. The analyses of gene structures, conserved domains and motifs suggest that LRR-RLKs from the same subfamily are consistent in their exon/intron organization and composition. LRR-RLK genes were found along the 11 chromosomes of the species, including regions of proximity with anthracnose resistance markers. By investigating the duplication events within the LRR-RLK family, we associated the importance of such a family with an expansion resulting from a strong stabilizing selection. Promoter analysis was also performed, highlighting cis-elements associated with the plant response to biotic stress. With regard to the expression pattern of LRR-RLKs in response to the infection by C. lindemuthianum, we could point out several differentially expressed genes in this subfamily, which were associated to specific molecular patterns of LRR-RLKs. Our work provides a broad analysis of the LRR-RLK family in P. vulgaris, allowing an in-depth structural and functional characterization of genes and proteins of this family. From specific expression patterns related to anthracnose response, we could infer a direct participation of RLK-LRR genes in the mechanisms of resistance to anthracnose, highlighting important subfamilies for further investigations.
Collapse
Affiliation(s)
| | - Alexandre Hild Aono
- Molecular Biology and Genetic Engineering Center (CBMEG), University of Campinas (UNICAMP), Campinas, SP, Brazil
| | | | | |
Collapse
|
8
|
Sargsyan K, Mazmanian K, Lim C. A strategy for evaluating potential antiviral resistance to small molecule drugs and application to SARS-CoV-2. Sci Rep 2023; 13:502. [PMID: 36627366 PMCID: PMC9831016 DOI: 10.1038/s41598-023-27649-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 01/05/2023] [Indexed: 01/11/2023] Open
Abstract
Alterations in viral fitness cannot be inferred from only mutagenesis studies of an isolated viral protein. To-date, no systematic analysis has been performed to identify mutations that improve virus fitness and reduce drug efficacy. We present a generic strategy to evaluate which viral mutations might diminish drug efficacy and applied it to assess how SARS-CoV-2 evolution may affect the efficacy of current approved/candidate small-molecule antivirals for Mpro, PLpro, and RdRp. For each drug target, we determined the drug-interacting virus residues from available structures and the selection pressure of the virus residues from the SARS-CoV-2 genomes. This enabled the identification of promising drug target regions and small-molecule antivirals that the virus can develop resistance. Our strategy of utilizing sequence and structural information from genomic sequence and protein structure databanks can rapidly assess the fitness of any emerging virus variants and can aid antiviral drug design for future pathogens.
Collapse
Affiliation(s)
- Karen Sargsyan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.
| | - Karine Mazmanian
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.
| | - Carmay Lim
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.
| |
Collapse
|
9
|
Krysińska M, Baranowski B, Deszcz B, Pawłowski K, Gradowski M. Pan-kinome of Legionella expanded by a bioinformatics survey. Sci Rep 2022; 12:21782. [PMID: 36526881 PMCID: PMC9758233 DOI: 10.1038/s41598-022-26109-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022] Open
Abstract
The pathogenic Legionella bacteria are notorious for delivering numerous effector proteins into the host cell with the aim of disturbing and hijacking cellular processes for their benefit. Despite intensive studies, many effectors remain uncharacterized. Motivated by the richness of Legionella effector repertoires and their oftentimes atypical biochemistry, also by several known atypical Legionella effector kinases and pseudokinases discovered recently, we undertook an in silico survey and exploration of the pan-kinome of the Legionella genus, i.e., the union of the kinomes of individual species. In this study, we discovered 13 novel (pseudo)kinase families (all are potential effectors) with the use of non-standard bioinformatic approaches. Together with 16 known families, we present a catalog of effector and non-effector protein kinase-like families within Legionella, available at http://bioinfo.sggw.edu.pl/kintaro/ . We analyze and discuss the likely functional roles of the novel predicted kinases. Notably, some of the kinase families are also present in other bacterial taxa, including other pathogens, often phylogenetically very distant from Legionella. This work highlights Nature's ingeniousness in the pathogen-host arms race and offers a useful resource for the study of infection mechanisms.
Collapse
Affiliation(s)
- Marianna Krysińska
- grid.13276.310000 0001 1955 7966Department of Biochemistry and Microbiology, Warsaw University of Life Sciences — SGGW, Warsaw, Poland
| | - Bartosz Baranowski
- grid.413454.30000 0001 1958 0162Laboratory of Plant Pathogenesis, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | - Bartłomiej Deszcz
- grid.13276.310000 0001 1955 7966Department of Biochemistry and Microbiology, Warsaw University of Life Sciences — SGGW, Warsaw, Poland
| | - Krzysztof Pawłowski
- grid.13276.310000 0001 1955 7966Department of Biochemistry and Microbiology, Warsaw University of Life Sciences — SGGW, Warsaw, Poland ,grid.267313.20000 0000 9482 7121Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX USA ,grid.4514.40000 0001 0930 2361Department of Translational Medicine, Lund University, Lund, Sweden ,grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Dallas, TX, USA
| | - Marcin Gradowski
- grid.13276.310000 0001 1955 7966Department of Biochemistry and Microbiology, Warsaw University of Life Sciences — SGGW, Warsaw, Poland
| |
Collapse
|
10
|
Lupo U, Sgarbossa D, Bitbol AF. Protein language models trained on multiple sequence alignments learn phylogenetic relationships. Nat Commun 2022; 13:6298. [PMID: 36273003 PMCID: PMC9588007 DOI: 10.1038/s41467-022-34032-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 10/07/2022] [Indexed: 12/25/2022] Open
Abstract
Self-supervised neural language models with attention have recently been applied to biological sequence data, advancing structure, function and mutational effect prediction. Some protein language models, including MSA Transformer and AlphaFold's EvoFormer, take multiple sequence alignments (MSAs) of evolutionarily related proteins as inputs. Simple combinations of MSA Transformer's row attentions have led to state-of-the-art unsupervised structural contact prediction. We demonstrate that similarly simple, and universal, combinations of MSA Transformer's column attentions strongly correlate with Hamming distances between sequences in MSAs. Therefore, MSA-based language models encode detailed phylogenetic relationships. We further show that these models can separate coevolutionary signals encoding functional and structural constraints from phylogenetic correlations reflecting historical contingency. To assess this, we generate synthetic MSAs, either without or with phylogeny, from Potts models trained on natural MSAs. We find that unsupervised contact prediction is substantially more resilient to phylogenetic noise when using MSA Transformer versus inferred Potts models.
Collapse
Affiliation(s)
- Umberto Lupo
- grid.5333.60000000121839049Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Damiano Sgarbossa
- grid.5333.60000000121839049Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Anne-Florence Bitbol
- grid.5333.60000000121839049Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| |
Collapse
|
11
|
Odrzywolek K, Karwowska Z, Majta J, Byrski A, Milanowska-Zabel K, Kosciolek T. Deep embeddings to comprehend and visualize microbiome protein space. Sci Rep 2022; 12:10332. [PMID: 35725732 PMCID: PMC9209496 DOI: 10.1038/s41598-022-14055-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 05/31/2022] [Indexed: 12/13/2022] Open
Abstract
Understanding the function of microbial proteins is essential to reveal the clinical potential of the microbiome. The application of high-throughput sequencing technologies allows for fast and increasingly cheaper acquisition of data from microbial communities. However, many of the inferred protein sequences are novel and not catalogued, hence the possibility of predicting their function through conventional homology-based approaches is limited, which indicates the need for further research on alignment-free methods. Here, we leverage a deep-learning-based representation of proteins to assess its utility in alignment-free analysis of microbial proteins. We trained a language model on the Unified Human Gastrointestinal Protein catalogue and validated the resulting protein representation on the bacterial part of the SwissProt database. Finally, we present a use case on proteins involved in SCFA metabolism. Results indicate that the deep learning model manages to accurately represent features related to protein structure and function, allowing for alignment-free protein analyses. Technologies that contextualize metagenomic data are a promising direction to deeply understand the microbiome.
Collapse
Affiliation(s)
- Krzysztof Odrzywolek
- Ardigen, Podole 76, 30-394, Krakow, Poland
- Institute of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Mickiewicza 30, 30-059, Krakow, Poland
| | - Zuzanna Karwowska
- Malopolska Centre of Biotechnology, Jagiellonian University, Gronostajowa 7A, 30-387, Krakow, Poland
| | - Jan Majta
- Ardigen, Podole 76, 30-394, Krakow, Poland
- Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387, Krakow, Poland
| | - Aleksander Byrski
- Institute of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Mickiewicza 30, 30-059, Krakow, Poland
| | | | - Tomasz Kosciolek
- Malopolska Centre of Biotechnology, Jagiellonian University, Gronostajowa 7A, 30-387, Krakow, Poland.
| |
Collapse
|
12
|
Littmann M, Heinzinger M, Dallago C, Weissenow K, Rost B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci Rep 2021; 11:23916. [PMID: 34903827 PMCID: PMC8668950 DOI: 10.1038/s41598-021-03431-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/02/2021] [Indexed: 01/27/2023] Open
Abstract
One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable-neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Konstantin Weissenow
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
13
|
Ahmed S, Rahman A, Hasan MAM, Ahmad S, Shovan SM. Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance. Sci Rep 2021; 11:18882. [PMID: 34556767 PMCID: PMC8460736 DOI: 10.1038/s41598-021-98458-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/08/2021] [Indexed: 02/08/2023] Open
Abstract
Identification of post-translational modifications (PTM) is significant in the study of computational proteomics, cell biology, pathogenesis, and drug development due to its role in many bio-molecular mechanisms. Though there are several computational tools to identify individual PTMs, only three predictors have been established to predict multiple PTMs at the same lysine residue. Furthermore, detailed analysis and assessment on dataset balancing and the significance of different feature encoding techniques for a suitable multi-PTM prediction model are still lacking. This study introduces a computational method named 'iMul-kSite' for predicting acetylation, crotonylation, methylation, succinylation, and glutarylation, from an unrecognized peptide sample with one, multiple, or no modifications. After successfully eliminating the redundant data samples from the majority class by analyzing the hardness of the sequence-coupling information, feature representation has been optimized by adopting the combination of ANOVA F-Test and incremental feature selection approach. The proposed predictor predicts multi-label PTM sites with 92.83% accuracy using the top 100 features. It has also achieved a 93.36% aiming rate and 96.23% coverage rate, which are much better than the existing state-of-the-art predictors on the validation test. This performance indicates that 'iMul-kSite' can be used as a supportive tool for further K-PTM study. For the convenience of the experimental scientists, 'iMul-kSite' has been deployed as a user-friendly web-server at http://103.99.176.239/iMul-kSite .
Collapse
Affiliation(s)
- Sabit Ahmed
- grid.443086.d0000 0004 1755 355XComputer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204 Bangladesh
| | - Afrida Rahman
- grid.443086.d0000 0004 1755 355XComputer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204 Bangladesh
| | - Md. Al Mehedi Hasan
- grid.443086.d0000 0004 1755 355XComputer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204 Bangladesh
| | - Shamim Ahmad
- grid.412656.20000 0004 0451 7306Computer Science and Engineering, University of Rajshahi, Rajshahi, 6205 Bangladesh
| | - S. M. Shovan
- grid.443086.d0000 0004 1755 355XComputer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204 Bangladesh
| |
Collapse
|
14
|
Hu G, Katuwawala A, Wang K, Wu Z, Ghadermarzi S, Gao J, Kurgan L. flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat Commun 2021; 12:4438. [PMID: 34290238 PMCID: PMC8295265 DOI: 10.1038/s41467-021-24773-7] [Citation(s) in RCA: 113] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/06/2021] [Indexed: 01/05/2023] Open
Abstract
Identification of intrinsic disorder in proteins relies in large part on computational predictors, which demands that their accuracy should be high. Since intrinsic disorder carries out a broad range of cellular functions, it is desirable to couple the disorder and disorder function predictions. We report a computational tool, flDPnn, that provides accurate, fast and comprehensive disorder and disorder function predictions from protein sequences. The recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment and results on other test datasets demonstrate that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions. These predictions are substantially better than the results of the existing disorder predictors and methods that predict functions of disorder. Ablation tests reveal that the high predictive performance stems from innovative ways used in flDPnn to derive sequence profiles and encode inputs. flDPnn's webserver is available at http://biomine.cs.vcu.edu/servers/flDPnn/.
Collapse
Affiliation(s)
- Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
15
|
Sharma NR, Gadhave K, Kumar P, Saif M, Khan MM, Sarkar DP, Uversky VN, Giri R. Analysis of the dark proteome of Chandipura virus reveals maximum propensity for intrinsic disorder in phosphoprotein. Sci Rep 2021; 11:13253. [PMID: 34168211 PMCID: PMC8225862 DOI: 10.1038/s41598-021-92581-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 06/07/2021] [Indexed: 02/05/2023] Open
Abstract
Chandipura virus (CHPV, a member of the Rhabdoviridae family) is an emerging pathogen that causes rapidly progressing influenza-like illness and acute encephalitis often leading to coma and death of the human host. Given several CHPV outbreaks in Indian sub-continent, recurring sporadic cases, neurological manifestation, and high mortality rate of this infection, CHPV is gaining global attention. The 'dark proteome' includes the whole proteome with special emphasis on intrinsically disordered proteins (IDP) and IDP regions (IDPR), which are proteins or protein regions that lack unique (or ordered) three-dimensional structures within the cellular milieu. These proteins/regions, however, play a number of vital roles in various biological processes, such as cell cycle regulation, control of signaling pathways, etc. and, therefore, are implicated in many human diseases. IDPs and IPPRs are also abundantly found in many viral proteins enabling their multifunctional roles in the viral life cycles and their capability to highjack various host systems. The unknown abundance of IDP and IDPR in CHPV, therefore, prompted us to analyze the dark proteome of this virus. Our analysis revealed a varying degree of disorder in all five CHPV proteins, with the maximum level of intrinsic disorder propensity being found in Phosphoprotein (P). We have also shown the flexibility of P protein using extensive molecular dynamics simulations up to 500 ns (ns). Furthermore, our analysis also showed the abundant presence of the disorder-based binding regions (also known as molecular recognition features, MoRFs) in CHPV proteins. The identification of IDPs/IDPRs in CHPV proteins suggests that their disordered regions may function as potential interacting domains and may also serve as novel targets for disorder-based drug designs.
Collapse
Affiliation(s)
- Nishi R Sharma
- School of Interdisciplinary Studies, Jamia Hamdard-Institute of Molecular Medicine (JH-IMM), Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India.
| | - Kundlik Gadhave
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Kamand, Himachal Pradesh, 175005, India
| | - Prateek Kumar
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Kamand, Himachal Pradesh, 175005, India
| | - Mohammad Saif
- School of Interdisciplinary Studies, Jamia Hamdard-Institute of Molecular Medicine (JH-IMM), Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Md M Khan
- School of Interdisciplinary Studies, Jamia Hamdard-Institute of Molecular Medicine (JH-IMM), Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Debi P Sarkar
- Department of Biochemistry, University of Delhi South Campus, New Delhi, 110021, India
| | - Vladimir N Uversky
- Department of Molecular Medicine and Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33620, USA.
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, 142290, Moscow, Russia.
| | - Rajanish Giri
- School of Basic Sciences, Indian Institute of Technology Mandi, VPO Kamand, Kamand, Himachal Pradesh, 175005, India.
| |
Collapse
|
16
|
Pokhrel S, Kraemer BR, Burkholz S, Mochly-Rosen D. Natural variants in SARS-CoV-2 Spike protein pinpoint structural and functional hotspots with implications for prophylaxis and therapeutic strategies. Sci Rep 2021; 11:13120. [PMID: 34162970 PMCID: PMC8222349 DOI: 10.1038/s41598-021-92641-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 04/30/2021] [Indexed: 12/17/2022] Open
Abstract
In December 2019, a novel coronavirus, termed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identified as the cause of pneumonia with severe respiratory distress and outbreaks in Wuhan, China. The rapid and global spread of SARS-CoV-2 resulted in the coronavirus 2019 (COVID-19) pandemic. Earlier during the pandemic, there were limited genetic viral variations. As millions of people became infected, multiple single amino acid substitutions emerged. Many of these substitutions have no consequences. However, some of the new variants show a greater infection rate, more severe disease, and reduced sensitivity to current prophylaxes and treatments. Of particular importance in SARS-CoV-2 transmission are mutations that occur in the Spike (S) protein, the protein on the viral outer envelope that binds to the human angiotensin-converting enzyme receptor (hACE2). Here, we conducted a comprehensive analysis of 441,168 individual virus sequences isolated from humans throughout the world. From the individual sequences, we identified 3540 unique amino acid substitutions in the S protein. Analysis of these different variants in the S protein pinpointed important functional and structural sites in the protein. This information may guide the development of effective vaccines and therapeutics to help arrest the spread of the COVID-19 pandemic.
Collapse
Affiliation(s)
- Suman Pokhrel
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Benjamin R Kraemer
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Daria Mochly-Rosen
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
17
|
Stervbo U, Rahmann S, Roch T, Westhoff TH, Babel N. Epitope similarity cannot explain the pre-formed T cell immunity towards structural SARS-CoV-2 proteins. Sci Rep 2020; 10:18995. [PMID: 33149224 PMCID: PMC7642385 DOI: 10.1038/s41598-020-75972-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 10/18/2020] [Indexed: 01/08/2023] Open
Abstract
The current pandemic is caused by the SARS-CoV-2 virus and large progress in understanding the pathology of the virus has been made since its emergence in late 2019. Several reports indicate short lasting immunity against endemic coronaviruses, which contrasts studies showing that biobanked venous blood contains T cells reactive to SARS-CoV-2 S-protein even before the outbreak in Wuhan. This suggests a preformed T cell memory towards structural proteins in individuals not exposed to SARS-CoV-2. Given the similarity of SARS-CoV-2 to other members of the Coronaviridae family, the endemic coronaviruses appear likely candidates to generate this T cell memory. However, given the apparent poor immunological memory created by the endemic coronaviruses, immunity against other common pathogens might offer an alternative explanation. Here, we utilize a combination of epitope prediction and similarity to common human pathogens to identify potential sources of the SARS-CoV-2 T cell memory. Although beta-coronaviruses are the most likely candidates to explain the pre-existing SARS-CoV-2 reactive T cells in uninfected individuals, the SARS-CoV-2 epitopes with the highest similarity to those from beta-coronaviruses are confined to replication associated proteins-not the host interacting S-protein. Thus, our study suggests that the observed SARS-CoV-2 pre-formed immunity to structural proteins is not driven by near-identical epitopes.
Collapse
Affiliation(s)
- Ulrik Stervbo
- Center for Translational Medicine, University Hospital Marien Hospital Herne, Ruhr-University, Bochum, Germany.
- Berlin-Brandenburg Center for Regenerative Therapies, and Institute of Medical Immunology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität Zu Berlin, Berlin Institute of Health, Berlin, Germany.
| | - Sven Rahmann
- Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, Duisburg, Germany.
| | - Toralf Roch
- Center for Translational Medicine, University Hospital Marien Hospital Herne, Ruhr-University, Bochum, Germany
- Berlin-Brandenburg Center for Regenerative Therapies, and Institute of Medical Immunology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität Zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Timm H Westhoff
- Center for Translational Medicine, University Hospital Marien Hospital Herne, Ruhr-University, Bochum, Germany
| | - Nina Babel
- Center for Translational Medicine, University Hospital Marien Hospital Herne, Ruhr-University, Bochum, Germany
- Berlin-Brandenburg Center for Regenerative Therapies, and Institute of Medical Immunology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität Zu Berlin, Berlin Institute of Health, Berlin, Germany
| |
Collapse
|
18
|
Kalman ZE, Mészáros B, Gáspári Z, Dobson L. Distribution of disease-causing germline mutations in coiled-coils implies an important role of their N-terminal region. Sci Rep 2020; 10:17333. [PMID: 33060664 PMCID: PMC7562717 DOI: 10.1038/s41598-020-74354-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 09/21/2020] [Indexed: 11/08/2022] Open
Abstract
Next-generation sequencing resulted in the identification of a huge number of naturally occurring variations in human proteins. The correct interpretation of the functional effects of these variations necessitates the understanding of how they modulate protein structure. Coiled-coils are α-helical structures responsible for a diverse range of functions, but most importantly, they facilitate the structural organization of macromolecular scaffolds via oligomerization. In this study, we analyzed a comprehensive set of disease-associated germline mutations in coiled-coil structures. Our results suggest an important role of residues near the N-terminal part of coiled-coil regions, possibly critical for superhelix assembly and folding in some cases. We also show that coiled-coils of different oligomerization states exhibit characteristically distinct patterns of disease-causing mutations. Our study provides structural and functional explanations on how disease emerges through the mutation of these structural motifs.
Collapse
Affiliation(s)
- Zsofia E Kalman
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter u. 50/A, 1083, Budapest, Hungary
- 3in-PPCU Research Group, 2500, Esztergom, Hungary
| | - Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, 69117, Heidelberg, Germany
| | - Zoltán Gáspári
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter u. 50/A, 1083, Budapest, Hungary.
| | - Laszlo Dobson
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter u. 50/A, 1083, Budapest, Hungary.
- Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, 1117, Budapest, Hungary.
| |
Collapse
|
19
|
Sergeeva AP, Katsamba PS, Cosmanescu F, Brewer JJ, Ahlsen G, Mannepalli S, Shapiro L, Honig B. DIP/Dpr interactions and the evolutionary design of specificity in protein families. Nat Commun 2020; 11:2125. [PMID: 32358559 PMCID: PMC7195491 DOI: 10.1038/s41467-020-15981-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 04/06/2020] [Indexed: 01/10/2023] Open
Abstract
Differential binding affinities among closely related protein family members underlie many biological phenomena, including cell-cell recognition. Drosophila DIP and Dpr proteins mediate neuronal targeting in the fly through highly specific protein-protein interactions. We show here that DIPs/Dprs segregate into seven specificity subgroups defined by binding preferences between their DIP and Dpr members. We then describe a sequence-, structure- and energy-based computational approach, combined with experimental binding affinity measurements, to reveal how specificity is coded on the canonical DIP/Dpr interface. We show that binding specificity of DIP/Dpr subgroups is controlled by "negative constraints", which interfere with binding. To achieve specificity, each subgroup utilizes a different combination of negative constraints, which are broadly distributed and cover the majority of the protein-protein interface. We discuss the structural origins of negative constraints, and potential general implications for the evolutionary origins of binding specificity in multi-protein families.
Collapse
Affiliation(s)
- Alina P Sergeeva
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Phinikoula S Katsamba
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Filip Cosmanescu
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Joshua J Brewer
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Goran Ahlsen
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Seetha Mannepalli
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Lawrence Shapiro
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
| | - Barry Honig
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
- Department of Medicine, Columbia University, New York, NY, USA.
| |
Collapse
|
20
|
Zuo X, Li B, Zhu C, Yan ZW, Li M, Wang X, Zhang YJ. Stoichiogenomics reveal oxygen usage bias, key proteins and pathways associated with stomach cancer. Sci Rep 2019; 9:11344. [PMID: 31383879 DOI: 10.1038/s41598-019-47533-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 07/08/2019] [Indexed: 01/07/2023] Open
Abstract
Stomach cancer involves hypoxia-specific microenvironments. Stoichiogenomics explores environmental resource limitation on biological macromolecules in terms of element usages. However, the patterns of oxygen usage by proteins and the ways that proteins adapt to a cancer hypoxia microenvironment are still unknown. Here we compared the oxygen and carbon contents ([C]) between proteomes of stomach cancer (hypoxia) and two stomach glandular cells (normal). Key proteins, genome locations, pathways, and functional dissection associated with stomach cancer were also studied. An association of oxygen content ([O]) and protein expression level was revealed in stomach cancer and stomach glandular cells. For differentially expressed proteins (DEPs), oxygen contents in the up regulated proteins were3.2%higherthan that in the down regulated proteins in stomach cancer. A total of 1,062 DEPs were identified; interestingly none of these proteins were coded on Y chromosome. The up regulated proteins were significantly enriched in pathways including regulation of actin cytoskeleton, cardiac muscle contraction, pathway of progesterone-mediated oocyte maturation, etc. Functional dissection of the up regulated proteins with high oxygen contents showed that most of them were cytoskeleton, cytoskeleton associated proteins, cyclins and signaling proteins in cell cycle progression. Element signature of resource limitation could not be detected in stomach cancer for oxygen, just as what happened in plants and microbes. Unsaved use of oxygen by the highly expressed proteins was adapted to the rapid growth and fast division of the stomach cancer cells. In addition, oxygen usage bias, key proteins and pathways identified in this paper laid a foundation for application of stoichiogenomics in precision medicine.
Collapse
|