1
|
Lupo U, Sgarbossa D, Milighetti M, Bitbol AF. DiffPaSS-high-performance differentiable pairing of protein sequences using soft scores. Bioinformatics 2024; 41:btae738. [PMID: 39672677 PMCID: PMC11676329 DOI: 10.1093/bioinformatics/btae738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 12/05/2024] [Accepted: 12/11/2024] [Indexed: 12/15/2024] Open
Abstract
MOTIVATION Identifying interacting partners from two sets of protein sequences has important applications in computational biology. Interacting partners share similarities across species due to their common evolutionary history, and feature correlations in amino acid usage due to the need to maintain complementary interaction interfaces. Thus, the problem of finding interacting pairs can be formulated as searching for a pairing of sequences that maximizes a sequence similarity or a coevolution score. Several methods have been developed to address this problem, applying different approximate optimization methods to different scores. RESULTS We introduce Differentiable Pairing using Soft Scores (DiffPaSS), a differentiable framework for flexible, fast, and hyperparameter-free optimization for pairing interacting biological sequences, which can be applied to a wide variety of scores. We apply it to a benchmark prokaryotic dataset, using mutual information and neighbor graph alignment scores. DiffPaSS outperforms existing algorithms for optimizing the same scores. We demonstrate the usefulness of our paired alignments for the prediction of protein complex structure. DiffPaSS does not require sequences to be aligned, and we also apply it to nonaligned sequences from T-cell receptors. AVAILABILITY AND IMPLEMENTATION A PyTorch implementation and installable Python package are available at https://github.com/Bitbol-Lab/DiffPaSS.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Martina Milighetti
- Division of Infection and Immunity, University College London, London WC1E 6BT, United Kingdom
- Cancer Institute, University College London, London WC1E 6DD, United Kingdom
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
2
|
Lupo U, Sgarbossa D, Bitbol AF. Pairing interacting protein sequences using masked language modeling. Proc Natl Acad Sci U S A 2024; 121:e2311887121. [PMID: 38913900 PMCID: PMC11228504 DOI: 10.1073/pnas.2311887121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/18/2023] [Indexed: 06/26/2024] Open
Abstract
Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
3
|
Lee YHG, Cerf NT, Shalaby N, Montes MR, Clarke RJ. Bioinformatic Study of Possible Acute Regulation of Acid Secretion in the Stomach. J Membr Biol 2024; 257:79-89. [PMID: 38436710 PMCID: PMC11006737 DOI: 10.1007/s00232-024-00310-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 02/21/2024] [Indexed: 03/05/2024]
Abstract
The gastric H+,K+-ATPase is an integral membrane protein which derives energy from the hydrolysis of ATP to transport H+ ions from the parietal cells of the gastric mucosa into the stomach in exchange for K+ ions. It is responsible for the acidic environment of the stomach, which is essential for digestion. Acid secretion is regulated by the recruitment of the H+,K+-ATPase from intracellular stores into the plasma membrane on the ingestion of food. The similar amino acid sequences of the lysine-rich N-termini α-subunits of the H+,K+- and Na+,K+-ATPases, suggests similar acute regulation mechanisms, specifically, an electrostatic switch mechanism involving an interaction of the N-terminal tail with the surface of the surrounding membrane and a modulation of the interaction via regulatory phosphorylation by protein kinases. From a consideration of sequence alignment of the H+,K+-ATPase and an analysis of its coevolution with protein kinase C and kinases of the Src family, the evidence points towards a phosphorylation of tyrosine-7 of the N-terminus by either Lck or Yes in all vertebrates except cartilaginous fish. The results obtained will guide and focus future experimental research.
Collapse
Affiliation(s)
- Yan Hay Grace Lee
- School of Chemistry, University of Sydney, Sydney, NSW, 2006, Australia
| | - Nicole T Cerf
- Instituto de Química y Fisicoquímica Biológica (IQUIFIB), CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Nicholas Shalaby
- School of Chemistry, University of Sydney, Sydney, NSW, 2006, Australia
| | - Mónica R Montes
- Instituto de Química y Fisicoquímica Biológica (IQUIFIB), CONICET, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Ronald J Clarke
- School of Chemistry, University of Sydney, Sydney, NSW, 2006, Australia.
- The University of Sydney Nano Institute, Sydney, NSW, 2006, Australia.
| |
Collapse
|
4
|
Bernett J, Blumenthal DB, List M. Cracking the black box of deep sequence-based protein-protein interaction prediction. Brief Bioinform 2024; 25:bbae076. [PMID: 38446741 PMCID: PMC10939362 DOI: 10.1093/bib/bbae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/09/2024] [Indexed: 03/08/2024] Open
Abstract
Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared them with basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test set, performances become random. Moreover, baseline models directly leveraging sequence similarity and network topology show good performances at a fraction of the computational cost. Thus, we advocate that any improvements should be reported relative to baseline methods in the future. Our findings suggest that predicting PPIs remains an unsolved task for proteins showing little sequence similarity to previously studied proteins, highlighting that further experimental research into the 'dark' protein interactome and better computational methods are needed.
Collapse
Affiliation(s)
- Judith Bernett
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354, Freising, Germany
| | - David B Blumenthal
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Werner-von-Siemens-Str. 61, 91052, Erlangen, Germany
| | - Markus List
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354, Freising, Germany
| |
Collapse
|
5
|
Mishra SK, Priya P, Rai GP, Haque R, Shanker A. Coevolution based immunoinformatics approach considering variability of epitopes to combat different strains: A case study using spike protein of SARS-CoV-2. Comput Biol Med 2023; 163:107233. [PMID: 37422941 DOI: 10.1016/j.compbiomed.2023.107233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 06/03/2023] [Accepted: 07/01/2023] [Indexed: 07/11/2023]
Abstract
In the recent past several vaccines were developed to combat the COVID-19 disease. Unfortunately, the protective efficacy of the current vaccines has been reduced due to the high mutation rate in SARS-CoV-2. Here, we successfully implemented a coevolution based immunoinformatics approach to design an epitope-based peptide vaccine considering variability in spike protein of SARS-CoV-2. The spike glycoprotein was investigated for B- and T-cell epitope prediction. Identified T-cell epitopes were mapped on previously reported coevolving amino acids in the spike protein to introduce mutation. The non-mutated and mutated vaccine components were constructed by selecting epitopes showing overlapping with the predicted B-cell epitopes and highest antigenicity. Selected epitopes were linked with the help of a linker to construct a single vaccine component. Non-mutated and mutated vaccine component sequences were modelled and validated. The in-silico expression level of the vaccine constructs (non-mutated and mutated) in E. coli K12 shows promising results. The molecular docking analysis of vaccine components with toll-like receptor 5 (TLR5) demonstrated strong binding affinity. The time series calculations including root mean square deviation (RMSD), radius of gyration (RGYR), and energy of the system over 100 ns trajectory obtained from all atom molecular dynamics simulation showed stability of the system. The combined coevolutionary and immunoinformatics approach used in this study will certainly help to design an effective peptide vaccine that may work against different strains of SARS-CoV-2. Moreover, the strategy used in this study can be implemented on other pathogens.
Collapse
Affiliation(s)
- Saurav Kumar Mishra
- Department of Bioinformatics, Central University of South Bihar, Gaya, Bihar, India
| | - Prerna Priya
- Department of Botany, Purnea Mahila College, Purnia, Bihar, India
| | - Gyan Prakash Rai
- Department of Bioinformatics, Central University of South Bihar, Gaya, Bihar, India
| | - Rizwanul Haque
- Department of Biotechnology, Central University of South Bihar, Gaya, Bihar, India
| | - Asheesh Shanker
- Department of Bioinformatics, Central University of South Bihar, Gaya, Bihar, India.
| |
Collapse
|
6
|
Gandarilla-Pérez CA, Pinilla S, Bitbol AF, Weigt M. Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins. PLoS Comput Biol 2023; 19:e1011010. [PMID: 36996234 PMCID: PMC10089317 DOI: 10.1371/journal.pcbi.1011010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/11/2023] [Accepted: 03/08/2023] [Indexed: 04/01/2023] Open
Abstract
Predicting protein-protein interactions from sequences is an important goal of computational biology. Various sources of information can be used to this end. Starting from the sequences of two interacting protein families, one can use phylogeny or residue coevolution to infer which paralogs are specific interaction partners within each species. We show that these two signals can be combined to improve the performance of the inference of interaction partners among paralogs. For this, we first align the sequence-similarity graphs of the two families through simulated annealing, yielding a robust partial pairing. We next use this partial pairing to seed a coevolution-based iterative pairing algorithm. This combined method improves performance over either separate method. The improvement obtained is striking in the difficult cases where the average number of paralogs per species is large or where the total number of sequences is modest.
Collapse
Affiliation(s)
- Carlos A Gandarilla-Pérez
- Facultad de Física, Universidad de la Habana, San Lázaro y L, Vedado, Habana, Cuba
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| | - Sergio Pinilla
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (UMR 8237), Paris, France
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| |
Collapse
|
7
|
Wangwiwatsin A, Kulwong S, Phetcharaburanin J, Namwat N, Klanrit P, Loilome W, Maleewong W, Reid AJ. Toward novel treatment against filariasis: Insight into genome-wide co-evolutionary analysis of filarial nematodes and Wolbachia. Front Microbiol 2023; 14:1052352. [PMID: 37032902 PMCID: PMC10073474 DOI: 10.3389/fmicb.2023.1052352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/16/2023] [Indexed: 04/11/2023] Open
Abstract
Infectious diseases caused by filarial nematodes are major health problems for humans and animals globally. Current treatment using anti-helminthic drugs requires a long treatment period and is only effective against the microfilarial stage. Most species of filarial nematodes harbor a specific strain of Wolbachia bacteria, which are essential for the survival, development, and reproduction of the nematodes. This parasite-bacteria obligate symbiosis offers a new angle for the cure of filariasis. In this study, we utilized publicly available genome data and putative protein sequences from seven filarial nematode species and their symbiotic Wolbachia to screen for protein-protein interactions that could be a novel target against multiple filarial nematode species. Genome-wide in silico screening was performed to predict molecular interactions based on co-evolutionary signals. We identified over 8,000 pairs of gene families that show evidence of co-evolution based on high correlation score and low false discovery rate (FDR) between gene families and obtained a candidate list that may be keys in filarial nematode-Wolbachia interactions. Functional analysis was conducted on these top-scoring pairs, revealing biological processes related to various signaling processes, adult lifespan, developmental control, lipid and nucleotide metabolism, and RNA modification. Furthermore, network analysis of the top-scoring genes with multiple co-evolving pairs suggests candidate genes in both Wolbachia and the nematode that may play crucial roles at the center of multi-gene networks. A number of the top-scoring genes matched well to known drug targets, suggesting a promising drug-repurposing strategy that could be applicable against multiple filarial nematode species.
Collapse
Affiliation(s)
- Arporn Wangwiwatsin
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand
- Khon Kaen University Phenome Centre, Khon Kaen University, Khon Kaen, Thailand
| | - Siriyakorn Kulwong
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand
- Khon Kaen University Phenome Centre, Khon Kaen University, Khon Kaen, Thailand
| | - Jutarop Phetcharaburanin
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand
- Khon Kaen University Phenome Centre, Khon Kaen University, Khon Kaen, Thailand
| | - Nisana Namwat
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand
- Khon Kaen University Phenome Centre, Khon Kaen University, Khon Kaen, Thailand
| | - Poramate Klanrit
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand
- Khon Kaen University Phenome Centre, Khon Kaen University, Khon Kaen, Thailand
| | - Watcharin Loilome
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand
- Khon Kaen University Phenome Centre, Khon Kaen University, Khon Kaen, Thailand
| | - Wanchai Maleewong
- Department of Parasitology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
| | - Adam J Reid
- Parasite Genomics Group, Wellcome Sanger Institute, Hinxton, United Kingdom
- The Gurdon Institute, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
8
|
Wei Q, Liu J, Guo F, Wang Z, Zhang X, Yuan L, Ali K, Qiang F, Wen Y, Li W, Zheng B, Bai Q, Li G, Ren H, Wu G. Kinase regulators evolved into two families by gain and loss of ability to bind plant steroid receptors. PLANT PHYSIOLOGY 2023; 191:1167-1185. [PMID: 36494097 PMCID: PMC9922406 DOI: 10.1093/plphys/kiac568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/29/2022] [Indexed: 06/17/2023]
Abstract
All biological functions evolve by fixing beneficial mutations and removing deleterious ones. Therefore, continuously fixing and removing the same essential function to separately diverge monophyletic gene families sounds improbable. Yet, here we report that brassinosteroid insensitive1 kinase inhibitor1 (BKI1)/membrane-associated kinase regulators (MAKRs) regulating a diverse function evolved into BKI1 and MAKR families from a common ancestor by respectively enhancing and losing ability to bind brassinosteroid receptor brassinosteroid insensitive1 (BRI1). The BKI1 family includes BKI1, MAKR1/BKI1-like (BKL) 1, and BKL2, while the MAKR family contains MAKR2-6. Seedless plants contain only BKL2. In seed plants, MAKR1/BKL1 and MAKR3, duplicates of BKL2, gained and lost the ability to bind BRI1, respectively. In angiosperms, BKL2 lost the ability to bind BRI1 to generate MAKR2, while BKI1 and MAKR6 were duplicates of MAKR1/BKL1 and MAKR3, respectively. In dicots, MAKR4 and MAKR5 were duplicates of MAKR3 and MAKR2, respectively. Importantly, BKI1 localized in the plasma membrane, but BKL2 localized to the nuclei while MAKR1/BKL1 localized throughout the whole cell. Importantly, BKI1 strongly and MAKR1/BKL1 weakly inhibited plant growth, but BKL2 and the MAKR family did not inhibit plant growth. Functional study of the chimeras of their N- and C-termini showed that only the BKI1 family was partially reconstructable, supporting stepwise evolution by a seesaw mechanism between their C- and N-termini to alternately gain an ability to bind and inhibit BRI1, respectively. Nevertheless, the C-terminal BRI1-interacting motif best defines the divergence of BKI1/MAKRs. Therefore, BKI1 and MAKR families evolved by gradually gaining and losing the same function, respectively, extremizing divergent evolution and adding insights into gene (BKI1/MAKR) duplication and divergence.
Collapse
|
9
|
Bioinformatic Analysis of Na +, K +-ATPase Regulation through Phosphorylation of the Alpha-Subunit N-Terminus. Int J Mol Sci 2022; 24:ijms24010067. [PMID: 36613508 PMCID: PMC9820343 DOI: 10.3390/ijms24010067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 12/01/2022] [Accepted: 12/17/2022] [Indexed: 12/24/2022] Open
Abstract
The Na+, K+-ATPase is an integral membrane protein which uses the energy of ATP hydrolysis to pump Na+ and K+ ions across the plasma membrane of all animal cells. It plays crucial roles in numerous physiological processes, such as cell volume regulation, nutrient reabsorption in the kidneys, nerve impulse transmission, and muscle contraction. Recent data suggest that it is regulated via an electrostatic switch mechanism involving the interaction of its lysine-rich N-terminus with the cytoplasmic surface of its surrounding lipid membrane, which can be modulated through the regulatory phosphorylation of the conserved serine and tyrosine residues on the protein's N-terminal tail. Prior data indicate that the kinases responsible for phosphorylation belong to the protein kinase C (PKC) and Src kinase families. To provide indications of which particular enzyme of these families might be responsible, we analysed them for evidence of coevolution via the mirror tree method, utilising coevolution as a marker for a functional interaction. The results obtained showed that the most likely kinase isoforms to interact with the Na+, K+-ATPase were the θ and η isoforms of PKC and the Src kinase itself. These theoretical results will guide the direction of future experimental studies.
Collapse
|
10
|
Xie P, Liu J, Lu R, Zhang Y, Sun X. Molecular evolution of the Pi-d2 gene conferring resistance to rice blast in Oryza. Front Genet 2022; 13:991900. [PMID: 36147495 PMCID: PMC9486079 DOI: 10.3389/fgene.2022.991900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 08/10/2022] [Indexed: 11/15/2022] Open
Abstract
The exploitation of plant disease resistance (R) genes in breeding programs is an effective strategy for coping with pathogens. An understanding of R gene variation is the basis for this strategy. Rice blast disease, caused by the Magnaporthe oryzae fungus, is a destructive disease of rice. The rice blast resistance gene Pi-d2 represents a new class of plant R gene because of its novel extracellular domain. We investigated the nucleotide polymorphism, phylogenetic topology and evolution patterns of the Pi-d2 gene among 67 cultivated and wild rice relatives. The Pi-d2 gene originated early in the basal Poales and has remained as a single gene without expansion. The striking finding is that susceptible Pi-d2 alleles might be derived from a single nucleotide substitution of the resistant alleles after the split of Oryza subspecies. Functional pleiotropy and linkage effects are proposed for the evolution and retention of the disease-susceptible alleles in rice populations. One set of DNA primers was developed from the polymorphic position to detect the functional nucleotide polymorphism for disease resistance of the Pi-d2 gene based on conventional Polymerase Chain Reaction. The nucleotide diversity level varied between different domains of the Pi-d2 gene, which might be related to distinct functions of each domain in the disease defense response. Directional (or purifying) selection appears dominant in the molecular evolution of the Pi-d2 gene and has shaped its conserved variation pattern.
Collapse
Affiliation(s)
| | | | | | | | - Xiaoqin Sun
- *Correspondence: Yanmei Zhang, ; Xiaoqin Sun,
| |
Collapse
|
11
|
Gerardos A, Dietler N, Bitbol AF. Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences. PLoS Comput Biol 2022; 18:e1010147. [PMID: 35576238 PMCID: PMC9135348 DOI: 10.1371/journal.pcbi.1010147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 05/26/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural data set, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.
Collapse
Affiliation(s)
- Andonis Gerardos
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
12
|
Gai Z, Wang Y, Tian L, Gong G, Zhao J. Whole Genome Level Analysis of the Wnt and DIX Gene Families in Mice and Their Coordination Relationship in Regulating Cardiac Hypertrophy. Front Genet 2021; 12:608936. [PMID: 34168671 PMCID: PMC8217762 DOI: 10.3389/fgene.2021.608936] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 05/17/2021] [Indexed: 12/27/2022] Open
Abstract
The Wnt signaling pathway is an evolutionarily conserved signaling pathway that plays essential roles in embryonic development, organogenesis, and many other biological activities. Both Wnt proteins and DIX proteins are important components of Wnt signaling. Systematic studies of Wnt and DIX families at the genome-wide level may provide a comprehensive landscape to elucidate their functions and demonstrate their relationships, but they are currently lacking. In this report, we describe the correlations between mouse Wnt and DIX genes in family expansion, molecular evolution, and expression levels in cardiac hypertrophy at the genome-wide scale. We observed that both the Wnt and DIX families underwent more expansion than the overall average in the evolutionarily early stage. In addition, mirrortree analyses suggested that Wnt and DIX were co-evolved protein families. Collectively, these results would help to elucidate the evolutionary characters of Wnt and DIX families and demonstrate their correlations in mediating cardiac hypertrophy.
Collapse
Affiliation(s)
- Zhongchao Gai
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| | - Yujiao Wang
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| | - Lu Tian
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| | - Guoli Gong
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi'an, China
| | - Jieqiong Zhao
- Department of Cardiology, The Second Affiliated Hospital of Air Force Medical University, Xi'an, China
| |
Collapse
|
13
|
Xu C, Chang X, Hou Z, Zhang Z, Zhu Z, Zhong B. The origin of SPA reveals the divergence and convergence of light signaling in Archaeplastida. Mol Phylogenet Evol 2021; 161:107175. [PMID: 33862251 DOI: 10.1016/j.ympev.2021.107175] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 03/28/2021] [Accepted: 04/06/2021] [Indexed: 01/02/2023]
Abstract
Plants have evolved various photoreceptors to adapt to changing light environments, and photoreceptors can inactivate the large CONSTITUTIVE PHOTOMORPHOGENIC/DE-ETIOLATED/FUSCA (COP/DET/FUS) protein complex to release their repression of photoresponsive transcription factors. Here, we tracked the origin and evolution of COP/DET/FUS in Archaeplastida and found that most components of COP/DET/FUS were highly conserved. Intriguingly, the COP1-SUPPRESSOR OF PHYA-105 (SPA) protein originated in Chlorophyta but subsequently underwent a distinct evolutionary history in Viridiplantae. SPA experienced duplication events in the ancestors of specific clades after the colonization of land by plants and was divided into two clades (clades A and B) within euphyllophytes (ferns and seed plants). Our phylogenetic and experimental evidences support a new evolutionary model to clarify the divergence and convergence of light signaling during plant evolution.
Collapse
Affiliation(s)
- Chenjie Xu
- College of Life Sciences, Nanjing Normal University, 210046 Nanjing, China
| | - Xin Chang
- College of Life Sciences, Nanjing Normal University, 210046 Nanjing, China
| | - Zheng Hou
- College of Life Sciences, Nanjing Normal University, 210046 Nanjing, China
| | - Zhenhua Zhang
- College of Life Sciences, Nanjing Normal University, 210046 Nanjing, China
| | - Ziqiang Zhu
- College of Life Sciences, Nanjing Normal University, 210046 Nanjing, China
| | - Bojian Zhong
- College of Life Sciences, Nanjing Normal University, 210046 Nanjing, China.
| |
Collapse
|
14
|
Structural Insights into Carboxylic Polyester-Degrading Enzymes and Their Functional Depolymerizing Neighbors. Int J Mol Sci 2021; 22:ijms22052332. [PMID: 33652738 PMCID: PMC7956259 DOI: 10.3390/ijms22052332] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 11/28/2022] Open
Abstract
Esters are organic compounds widely represented in cellular structures and metabolism, originated by the condensation of organic acids and alcohols. Esterification reactions are also used by chemical industries for the production of synthetic plastic polymers. Polyester plastics are an increasing source of environmental pollution due to their intrinsic stability and limited recycling efforts. Bioremediation of polyesters based on the use of specific microbial enzymes is an interesting alternative to the current methods for the valorization of used plastics. Microbial esterases are promising catalysts for the biodegradation of polyesters that can be engineered to improve their biochemical properties. In this work, we analyzed the structure-activity relationships in microbial esterases, with special focus on the recently described plastic-degrading enzymes isolated from marine microorganisms and their structural homologs. Our analysis, based on structure-alignment, molecular docking, coevolution of amino acids and surface electrostatics determined the specific characteristics of some polyester hydrolases that could be related with their efficiency in the degradation of aromatic polyesters, such as phthalates.
Collapse
|
15
|
Bloch I, Sherill-Rofe D, Stupp D, Unterman I, Beer H, Sharon E, Tabach Y. Optimization of co-evolution analysis through phylogenetic profiling reveals pathway-specific signals. Bioinformatics 2021; 36:4116-4125. [PMID: 32353123 DOI: 10.1093/bioinformatics/btaa281] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 04/17/2020] [Accepted: 04/23/2020] [Indexed: 12/11/2022] Open
Abstract
SUMMARY The exponential growth in available genomic data is expected to reach full sequencing of a million genomes in the coming decade. Improving and developing methods to analyze these genomes and to reveal their utility is of major interest in a wide variety of fields, such as comparative and functional genomics, evolution and bioinformatics. Phylogenetic profiling is an established method for predicting functional interactions between proteins based on similarities in their evolutionary patterns across species. Proteins that function together (i.e. generate complexes, interact in the same pathways or improve adaptation to environmental niches) tend to show coordinated evolution across the tree of life. The normalized phylogenetic profiling (NPP) method takes into account minute changes in proteins across species to identify protein co-evolution. Despite the success of this method, it is still not clear what set of parameters is required for optimal use of co-evolution in predicting functional interactions. Moreover, it is not clear if pathway evolution or function should direct parameter choice. Here, we create a reliable and usable NPP construction pipeline. We explore the effect of parameter selection on functional interaction prediction using NPP from 1028 genomes, both separately and in various value combinations. We identify several parameter sets that optimize performance for pathways with certain biological annotation. This work reveals the importance of choosing the right parameters for optimized function prediction based on a biological context. AVAILABILITY AND IMPLEMENTATION Source code and documentation are available on GitHub: https://github.com/iditam/CompareNPPs. CONTACT yuvaltab@ekmd.huji.ac.il. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Irene Unterman
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Hodaya Beer
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| |
Collapse
|
16
|
Salmanian S, Pezeshk H, Sadeghi M. Inter-protein residue covariation information unravels physically interacting protein dimers. BMC Bioinformatics 2020; 21:584. [PMID: 33334319 PMCID: PMC7745481 DOI: 10.1186/s12859-020-03930-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. RESULTS In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. CONCLUSIONS In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.
Collapse
Affiliation(s)
- Sara Salmanian
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
- Present Address: Department of Mathematics and Statistics, Concordia University, Montreal, Canada
- School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
17
|
Phylogenetic correlations can suffice to infer protein partners from sequences. PLoS Comput Biol 2019; 15:e1007179. [PMID: 31609984 PMCID: PMC6812855 DOI: 10.1371/journal.pcbi.1007179] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 10/24/2019] [Accepted: 09/25/2019] [Indexed: 12/30/2022] Open
Abstract
Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We further demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known direct physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We finally discuss how to distinguish physically interacting proteins from proteins that only share a common evolutionary history. Many biologically important protein-protein interactions are conserved over evolutionary time scales. This leads to two different signals that can be used to computationally predict interactions between protein families and to identify specific interaction partners. First, the shared evolutionary history leads to highly similar phylogenetic relationships between interacting proteins of the two families. Second, the need to keep the interaction surfaces of partner proteins biophysically compatible causes a correlated amino-acid usage of interface residues. Employing simulated data, we show that the shared history alone can be used to detect partner proteins. Similar accuracies are achieved by algorithms comparing phylogenetic relationships and by methods based on Direct Coupling Analysis (DCA), which are primarily known for their ability to detect the second type of signal. Using natural sequence data, we show that in cases with shared evolutionary history but without known physical interactions, both methods work with similar accuracy, while for some physically interacting systems, DCA and mutual information outperform phylogenetic methods. We propose methods allowing both to predict interactions between protein families and to find interacting partners among paralogs.
Collapse
|
18
|
Hillier C, Pardo M, Yu L, Bushell E, Sanderson T, Metcalf T, Herd C, Anar B, Rayner JC, Billker O, Choudhary JS. Landscape of the Plasmodium Interactome Reveals Both Conserved and Species-Specific Functionality. Cell Rep 2019; 28:1635-1647.e5. [PMID: 31390575 PMCID: PMC6693557 DOI: 10.1016/j.celrep.2019.07.019] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 05/28/2019] [Accepted: 07/08/2019] [Indexed: 11/16/2022] Open
Abstract
Malaria represents a major global health issue, and the identification of new intervention targets remains an urgent priority. This search is hampered by more than one-third of the genes of malaria-causing Plasmodium parasites being uncharacterized. We report a large-scale protein interaction network in Plasmodium schizonts, generated by combining blue native-polyacrylamide electrophoresis with quantitative mass spectrometry and machine learning. This integrative approach, spanning 3 species, identifies >20,000 putative protein interactions, organized into 600 protein clusters. We validate selected interactions, assigning functions in chromatin regulation to previously unannotated proteins and suggesting a role for an EELM2 domain-containing protein and a putative microrchidia protein as mechanistic links between AP2-domain transcription factors and epigenetic regulation. Our interactome represents a high-confidence map of the native organization of core cellular processes in Plasmodium parasites. The network reveals putative functions for uncharacterized proteins, provides mechanistic and structural insight, and uncovers potential alternative therapeutic targets.
Collapse
Affiliation(s)
- Charles Hillier
- Developmental Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Mercedes Pardo
- Functional Proteomics, The Institute of Cancer Research, London SW7 3RP, UK.
| | - Lu Yu
- Functional Proteomics, The Institute of Cancer Research, London SW7 3RP, UK
| | - Ellen Bushell
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden, Umeå University, 901 87 Umeå, Sweden
| | - Theo Sanderson
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Tom Metcalf
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Colin Herd
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Burcu Anar
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Julian C Rayner
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Oliver Billker
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden, Umeå University, 901 87 Umeå, Sweden.
| | - Jyoti S Choudhary
- Functional Proteomics, The Institute of Cancer Research, London SW7 3RP, UK.
| |
Collapse
|
19
|
Pascual-García A, Arenas M, Bastolla U. The Molecular Clock in the Evolution of Protein Structures. Syst Biol 2019; 68:987-1002. [DOI: 10.1093/sysbio/syz022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 03/20/2019] [Accepted: 04/09/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract
The molecular clock hypothesis, which states that substitutions accumulate in protein sequences at a constant rate, plays a fundamental role in molecular evolution but it is violated when selective or mutational processes vary with time. Such violations of the molecular clock have been widely investigated for protein sequences, but not yet for protein structures. Here, we introduce a novel statistical test (Significant Clock Violations) and perform a large scale assessment of the molecular clock in the evolution of both protein sequences and structures in three large superfamilies. After validating our method with computer simulations, we find that clock violations are generally consistent in sequence and structure evolution, but they tend to be larger and more significant in structure evolution. Moreover, changes of function assessed through Gene Ontology and InterPro terms are associated with large and significant clock violations in structure evolution. We found that almost one third of significant clock violations are significant in structure evolution but not in sequence evolution, highlighting the advantage to use structure information for assessing accelerated evolution and gathering hints of positive selection. Clock violations between closely related pairs are frequently significant in sequence evolution, consistent with the observed time dependence of the substitution rate attributed to segregation of neutral and slightly deleterious polymorphisms, but not in structure evolution, suggesting that these substitutions do not affect protein structure although they may affect stability. These results are consistent with the view that natural selection, both negative and positive, constrains more strongly protein structures than protein sequences. Our code for computing clock violations is freely available at https://github.com/ugobas/Molecular_clock.
Collapse
Affiliation(s)
- Alberto Pascual-García
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, UK
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
| | - Miguel Arenas
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
20
|
Garrido-Martín D, Pazos F. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues. BMC Bioinformatics 2018; 19:67. [PMID: 29482506 PMCID: PMC5827975 DOI: 10.1186/s12859-018-2084-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. RESULTS In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. CONCLUSIONS These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
Collapse
Affiliation(s)
- Diego Garrido-Martín
- Present address: Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, c/ Dr. Aiguader, 88, 08003, Barcelona, Spain.,Present address: Universitat Pompeu Fabra (UPF), Plaça de la Mercè, 10-12, 08002, Barcelona, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), c/ Darwin, 3, 28049, Madrid, Spain.
| |
Collapse
|
21
|
Rice DW, Sheehan KB, Newton ILG. Large-Scale Identification of Wolbachia pipientis Effectors. Genome Biol Evol 2017; 9:1925-1937. [PMID: 28854601 PMCID: PMC5544941 DOI: 10.1093/gbe/evx139] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2017] [Indexed: 12/13/2022] Open
Abstract
Wolbachia pipientis is an intracellular symbiont of arthropods well known for the reproductive manipulations induced in the host and, more recently, for the ability of Wolbachia to block virus replication in insect vectors. Since Wolbachia cannot yet be genetically manipulated, and due to the constraints imposed when working with an intracellular symbiont, little is known about mechanisms used by Wolbachia for host interaction. Here we employed a bioinformatics pipeline and identified 163 candidate effectors, potentially secreted by Wolbachia into the host cell. A total of 84 of these candidates were then subjected to a screen of growth defects induced in yeast upon heterologous expression which identified 14 top candidates likely secreted by Wolbachia. These predicted secreted effectors may function in concert as we find that their native expression is correlated and is highly upregulated at specific time points during Drosophila development. In addition, the evolutionary histories of some of these predicted effectors are also correlated, suggesting they may function together, or in the same pathway, during host infection. Similarly, most of these predicted effectors are limited to one or two Wolbachia strains—perhaps reflecting shared evolutionary history and strain specific functions in host manipulation. Identification of these Wolbachia candidate effectors is the first step in dissecting the mechanisms of symbiont–host interaction in this important system.
Collapse
Affiliation(s)
- Danny W Rice
- Department of Biology, Indiana University, Bloomington
| | | | | |
Collapse
|
22
|
Chapa TJ, Du Y, Sun R, Yu D, French AR. Proteomic and phylogenetic coevolution analyses of pM79 and pM92 identify interactions with RNA polymerase II and delineate the murine cytomegalovirus late transcription complex. J Gen Virol 2017; 98:242-250. [PMID: 27926822 DOI: 10.1099/jgv.0.000676] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The regulation of the late viral gene expression in betaherpesviruses is largely undefined. We have previously shown that the murine cytomegalovirus proteins pM79 and pM92 are required for late gene transcription. Here, we provide insight into the mechanism of pM79 and pM92 activity by determining their interaction partners during infection. Co-immunoprecipitation-coupled MS studies demonstrate that pM79 and pM92 interact with an array of cellular and viral proteins involved in transcription. Specifically, we identify RNA polymerase II as a cellular target for both pM79 and pM92. We use inter-protein coevolution analysis to show how pM79 and pM92 likely assemble into a late transcription complex composed of late transcription regulators pM49, pM87 and pM95. Combining proteomic methods with coevolution computational analysis provides novel insights into the relationship between pM79, pM92 and RNA polymerase II and allows the generation of a model of the multi-component viral complex that regulates late gene transcription.
Collapse
Affiliation(s)
- Travis J Chapa
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, CA 90095, USA.,Division of Pediatric Rheumatology, Department of Pediatrics, Washington University School of Medicine, Saint Louis, MO 63110, USA.,Department of Molecular Microbiology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Yushen Du
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Dong Yu
- Department of Molecular Microbiology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Anthony R French
- Division of Pediatric Rheumatology, Department of Pediatrics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| |
Collapse
|
23
|
Jiménez-Sánchez A. Coevolution of RAC Small GTPases and their Regulators GEF Proteins. Evol Bioinform Online 2016; 12:121-31. [PMID: 27226705 PMCID: PMC4872645 DOI: 10.4137/ebo.s38031] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Revised: 03/31/2016] [Accepted: 04/03/2016] [Indexed: 01/16/2023] Open
Abstract
RAC proteins are small GTPases involved in important cellular processes in eukaryotes, and their deregulation may contribute to cancer. Activation of RAC proteins is regulated by DOCK and DBL protein families of guanine nucleotide exchange factors (GEFs). Although DOCK and DBL proteins act as GEFs on RAC proteins, DOCK and DBL family members are evolutionarily unrelated. To understand how DBL and DOCK families perform the same function on RAC proteins despite their unrelated primary structure, phylogenetic analyses of the RAC, DBL, and DOCK families were implemented, and interaction patterns that may suggest a coevolutionary process were searched. Interestingly, while RAC and DOCK proteins are very well conserved in humans and among eukaryotes, DBL proteins are highly divergent. Moreover, correlation analyses of the phylogenetic distances of RAC and GEF proteins and covariation analyses between residues in the interacting domains showed significant coevolution rates for both RAC–DOCK and RAC–DBL interactions.
Collapse
Affiliation(s)
- Alejandro Jiménez-Sánchez
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK.; Previously at Department of Biology, University of York, York, UK
| |
Collapse
|
24
|
Stimulation of Na(+),K(+)-ATPase Activity as a Possible Driving Force in Cholesterol Evolution. J Membr Biol 2015; 249:251-9. [PMID: 26715509 DOI: 10.1007/s00232-015-9864-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 12/09/2015] [Indexed: 12/19/2022]
Abstract
Cholesterol is exclusively produced by animals and is present in the plasma membrane of all animal cells. In contrast, the membranes of fungi and plants contain other sterols. To explain the exclusive preference of animal cells for cholesterol, we propose that cholesterol may have evolved to optimize the activity of a crucial protein found in the plasma membrane of all multicellular animals, namely the Na(+),K(+)-ATPase. To test this hypothesis, mirror tree and phylogenetic distribution analyses have been conducted of the Na(+),K(+)-ATPase and 3β-hydroxysterol Δ(24)-reductase (DHCR24), the last enzyme in the Bloch cholesterol biosynthetic pathway. The results obtained support the hypothesis of a co-evolution of the Na(+),K(+)-ATPase and DHCR24. The evolutionary correlation between DHCR24 and the Na(+),K(+)-ATPase was found to be stronger than between DHCR24 and any other membrane protein investigated. The results obtained, thus, also support the hypothesis that cholesterol evolved together with the Na(+),K(+)-ATPase in multicellular animals to support Na(+),K(+)-ATPase activity.
Collapse
|
25
|
Grayson P. Izumo1 and Juno: the evolutionary origins and coevolution of essential sperm-egg binding partners. ROYAL SOCIETY OPEN SCIENCE 2015; 2:150296. [PMID: 27019721 PMCID: PMC4807442 DOI: 10.1098/rsos.150296] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 11/17/2015] [Indexed: 05/29/2023]
Abstract
Reproductive proteins are among the most rapidly evolving classes of proteins. For a subset of these, rapid evolution is driven by positive Darwinian selection despite vital, well-conserved, reproductive functions. Izumo1 is the only essential sperm-egg fusion protein currently known on mammalian sperm, and its egg receptor (Juno; formerly Folr4) was recently discovered. Male knockout mice for Izumo1 and female knockout mice for Juno are both healthy but sterile. Here, both sperm-egg binding proteins are shown to be evolving under positive selection. Within mammals, coevolution of Izumo1 and Juno is also uncovered, suggesting that similar forces have shaped the evolutionary histories of these binding partners within Mammalia. Additionally, genomic analyses reveal an ancient origin for the Izumo gene family, initially reported as conserved exclusively in mammals. Newly identified Izumo1 orthologues could serve reproductive functions in birds, fish and reptiles. Surprisingly, these same analyses support Juno's presence in mammals alone, suggesting a recent mammalian-specific duplication and neofunctionalization of the ancestral folate receptor. Despite the indispensability of their reproductive interaction, and their apparent coevolution within Mammalia, this binding pair arose through strikingly different evolutionary forces.
Collapse
|