1
|
Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells. Science 2024; 384:eadk5864. [PMID: 38662832 DOI: 10.1126/science.adk5864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 03/22/2024] [Indexed: 05/04/2024]
Abstract
Chemical modulation of proteins enables a mechanistic understanding of biology and represents the foundation of most therapeutics. However, despite decades of research, 80% of the human proteome lacks functional ligands. Chemical proteomics has advanced fragment-based ligand discovery toward cellular systems, but throughput limitations have stymied the scalable identification of fragment-protein interactions. We report proteome-wide maps of protein-binding propensity for 407 structurally diverse small-molecule fragments. We verified that identified interactions can be advanced to active chemical probes of E3 ubiquitin ligases, transporters, and kinases. Integrating machine learning binary classifiers further enabled interpretable predictions of fragment behavior in cells. The resulting resource of fragment-protein interactions and predictive models will help to elucidate principles of molecular recognition and expedite ligand discovery efforts for hitherto undrugged proteins.
Collapse
|
2
|
ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction. RSC Adv 2024; 14:13083-13094. [PMID: 38655474 PMCID: PMC11034476 DOI: 10.1039/d4ra00748d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure-function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.
Collapse
|
3
|
Using an embryo specific promoter to modify iron distribution pattern in Arabidopsis. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2024; 339:111931. [PMID: 38030036 DOI: 10.1016/j.plantsci.2023.111931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/20/2023] [Accepted: 11/21/2023] [Indexed: 12/01/2023]
Abstract
Iron is an essential micronutrient for life. During the development of the seed, iron accumulates during embryo maturation. In Arabidopsis thaliana, iron mainly accumulates in the vacuoles of only one cell type, the cell layer that surrounds provasculature in hypocotyl and cotyledons. Iron accumulation pattern in Arabidopsis is an exception in plant phylogeny, most part of the dicot embryos accumulate iron in several cell layers including cortex and, in some cases, even in protodermis. It remains unknown how does iron reach the internal cell layers of the embryo, and in particular, the molecular mechanisms responsible of this process. Here, we use transgenic approaches to modify the iron accumulation pattern in an Arabidopsis model. Using the SDH2-3 embryo-specific promoter, we were able to express VIT1 ectopically in both a wild type background and a mutant vit1 background lacking expression of this vacuolar iron transporter. These manipulations modify the iron distribution pattern in Arabidopsis from one cell layer to several cell layers, including protodermis, cortex cells, and the endodermis. Interestingly, total seed iron content was not modified compared with the wild type, suggesting that iron distribution in embryos is not involved in the control of the total iron amount accumulated in seeds. This experimental model can be used to study the processes involved in iron distribution patterning during embryo maturation and its evolution in dicot plants.
Collapse
|
4
|
Experimental and Computational Analysis of Newly Identified Pathogenic Mutations in the Creatine Transporter SLC6A8. J Mol Biol 2024; 436:168383. [PMID: 38070861 DOI: 10.1016/j.jmb.2023.168383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 11/26/2023] [Accepted: 12/01/2023] [Indexed: 12/24/2023]
Abstract
Creatine is an essential metabolite for the storage and rapid supply of energy in muscle and nerve cells. In humans, impaired metabolism, transport, and distribution of creatine throughout tissues can cause varying forms of mental disability, also known as creatine deficiency syndrome (CDS). So far, 80 mutations in the creatine transporter (SLC6A8) have been associated to CDS. To better understand the effect of human genetic variants on the physiology of SLC6A8 and their possible impact on CDS, we studied 30 missense variants including 15 variants of unknown significance, two of which are reported here for the first time. We expressed these variants in HEK293 cells and explored their subcellular localization and transport activity. We also applied computational methods to predict variant effect and estimate site-specific changes in thermodynamic stability. To explore variants that might have a differential effect on the transporter's conformers along the transport cycle, we constructed homology models of the inward facing, and outward facing conformations. In addition, we used mass-spectrometry to study proteins that interact with wild type SLC6A8 and five selected variants in HEK293 cells. In silico models of the protein complexes revealed how two variants impact the interaction interface of SLC6A8 with other proteins and how pathogenic variants lead to an enrichment of ER protein partners. Overall, our integrated analysis disambiguates the pathogenicity of 15 variants of unknown significance revealing diverse mechanisms of pathogenicity, including two previously unreported variants obtained from patients suffering from the creatine deficiency syndrome.
Collapse
|
5
|
Discovery of Molecular Glue Degraders via Isogenic Morphological Profiling. ACS Chem Biol 2023; 18:2464-2473. [PMID: 38098458 PMCID: PMC10764104 DOI: 10.1021/acschembio.3c00598] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/11/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023]
Abstract
Molecular glue degraders (MGDs) are small molecules that degrade proteins of interest via the ubiquitin-proteasome system. While MGDs were historically discovered serendipitously, approaches for MGD discovery now include cell-viability-based drug screens or data mining of public transcriptomics and drug response datasets. These approaches, however, have target spaces restricted to the essential proteins. Here we develop a high-throughput workflow for MGD discovery that also reaches the nonessential proteome. This workflow begins with the rapid synthesis of a compound library by sulfur(VI) fluoride exchange chemistry coupled to a morphological profiling assay in isogenic cell lines that vary in levels of the E3 ligase CRBN. By comparing the morphological changes induced by compound treatment across the isogenic cell lines, we were able to identify FL2-14 as a CRBN-dependent MGD targeting the nonessential protein GSPT2. We envision that this workflow would contribute to the discovery and characterization of MGDs that target a wider range of proteins.
Collapse
|
6
|
Scientific utopias: the Eclosion Event. Nature 2023:10.1038/d41586-023-01854-9. [PMID: 37277473 DOI: 10.1038/d41586-023-01854-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
|
7
|
Functional characterization of SLC39 family members ZIP5 and ZIP10 in overexpressing HEK293 cells reveals selective copper transport activity. Biometals 2023; 36:227-237. [PMID: 36454509 DOI: 10.1007/s10534-022-00474-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/20/2022] [Indexed: 12/04/2022]
Abstract
Zinc is the second most prevalent metal element present in living organisms, and control of its concentration is pivotal to physiology. The amount of zinc available to the cell cytoplasm is regulated by the activity of members of the SLC39 family, the ZIP proteins. Selectivity of ZIP transporters has been the focus of earlier studies which provided a biochemical and structural basis for the selectivity for zinc over other metals such as copper, iron, and manganese. However, several previous studies have shown how certain ZIP proteins exhibit higher selectivity for metal elements other than zinc. Sequence similarities suggest an evolutionary basis for the elemental selectivity within the ZIP family. Here, by engineering HEK293 cells to overexpress ZIP proteins, we have studied the selectivity of two phylogenetic clades of ZIP proteins, that is ZIP8/ZIP14 (previously known to be iron and manganese transporters) and ZIP5/ZIP10. By incubating ZIP over-expressing cells in presence of several divalent metals, we found that ZIP5 and ZIP10 are high affinity copper transporters with greater selectivity over other elements, revealing a novel substrate signature for the ZIP5/ZIP10 clade.
Collapse
|
8
|
Gene Families, Epistasis and the Amino Acid Preferences of Protein Homologs. Evol Bioinform Online 2019; 15:1176934319870485. [PMID: 31452598 PMCID: PMC6698995 DOI: 10.1177/1176934319870485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 07/27/2019] [Indexed: 11/16/2022] Open
Abstract
In order to preserve structure and function, proteins tend to preferentially conserve amino acids at particular sites along the sequence. Because mutations can affect structure and function, the question arises whether the preference of a protein site for a particular amino acid varies between protein homologs, and to what extent that variation depends on sequence divergence. Answering these questions can help in the development of models of sequence evolution, as well as provide insights on the dependence of the fitness effects of mutations on the genetic background of sequences, a phenomenon known as epistasis. Here, I comment on recent computational work providing a systematic analysis of the extent to which the amino acid preferences of proteins depend on the background mutations of protein homologs.
Collapse
|
9
|
Computational Characterization of the mtORF of Pocilloporid Corals: Insights into Protein Structure and Function in Stylophora Lineages from Contrasting Environments. Genes (Basel) 2019; 10:E324. [PMID: 31035578 PMCID: PMC6562464 DOI: 10.3390/genes10050324] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Revised: 04/22/2019] [Accepted: 04/23/2019] [Indexed: 01/15/2023] Open
Abstract
More than a decade ago, a new mitochondrial Open Reading Frame (mtORF) was discovered in corals of the family Pocilloporidae and has been used since then as an effective barcode for these corals. Recently, mtORF sequencing revealed the existence of two differentiated Stylophora lineages occurring in sympatry along the environmental gradient of the Red Sea (18.5°C to 33.9°C). In the endemic Red Sea lineage RS_LinB, the mtORF and the heat shock protein gene hsp70 uncovered similar phylogeographic patterns strongly correlated with environmental variations. This suggests that the mtORF too might be involved in thermal adaptation. Here, we used computational analyses to explore the features and putative function of this mtORF. In particular, we tested the likelihood that this gene encodes a functional protein and whether it may play a role in adaptation. Analyses of full mitogenomes showed that the mtORF originated in the common ancestor of Madracis and other pocilloporids, and that it encodes a transmembrane protein differing in length and domain architecture among genera. Homology-based annotation and the relative conservation of metal-binding sites revealed traces of an ancient hydrolase catalytic activity. Furthermore, signals of pervasive purifying selection, lack of stop codons in 1830 sequences analyzed, and a codon-usage bias similar to that of other mitochondrial genes indicate that the protein is functional, i.e., not a pseudogene. Other features, such as intrinsically disordered regions, tandem repeats, and signals of positive selection particularly in StylophoraRS_LinB populations, are consistent with a role of the mtORF in adaptive responses to environmental changes.
Collapse
|
10
|
The Site-Specific Amino Acid Preferences of Homologous Proteins Depend on Sequence Divergence. Genome Biol Evol 2019; 11:121-135. [PMID: 30496400 PMCID: PMC6326188 DOI: 10.1093/gbe/evy261] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2018] [Indexed: 12/20/2022] Open
Abstract
The propensity of protein sites to be occupied by any of the 20 amino acids is known as site-specific amino acid preferences (SSAP). Under the assumption that SSAP are conserved among homologs, they can be used to parameterize evolutionary models for the reconstruction of accurate phylogenetic trees. However, simulations and experimental studies have not been able to fully assess the relative conservation of SSAP as a function of sequence divergence between protein homologs. Here, we implement a computational procedure to predict the SSAP of proteins based on the effect of changes in thermodynamic stability upon mutation. An advantage of this computational approach is that it allows us to interrogate a large and unbiased sample of homologous proteins, over the entire spectrum of sequence divergence, and under selection for the same molecular trait. We show that computational predictions have reproducibilities that resemble those obtained in experimental replicates, and can largely recapitulate the SSAP observed in a large-scale mutagenesis experiment. Our results support recent experimental reports on the conservation of SSAP of related homologs, with a slowly increasing fraction of up to 15% of different sites at sequence distances lower than 40%. However, even under the sole contribution of thermodynamic stability, our conservative approach identifies up to 30% of significant different sites between divergent homologs. We show that this relation holds for homologs of diverse sizes and structural classes. Analyses of residue contact networks suggest that an important determinant of these differences is the increasing accumulation of structural deviations that results from sequence divergence.
Collapse
|
11
|
Early emergence of negative regulation of the tyrosine kinase Src by the C-terminal Src kinase. J Biol Chem 2017; 292:18518-18529. [PMID: 28939764 DOI: 10.1074/jbc.m117.811174] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 09/19/2017] [Indexed: 02/05/2023] Open
Abstract
Stringent regulation of tyrosine kinase activity is essential for normal cellular function. In humans, the tyrosine kinase Src is inhibited via phosphorylation of its C-terminal tail by another kinase, C-terminal Src kinase (Csk). Although Src and Csk orthologs are present across holozoan organisms, including animals and protists, the Csk-Src negative regulatory mechanism appears to have evolved gradually. For example, in choanoflagellates, Src and Csk are both active, but the negative regulatory mechanism is reportedly absent. In filastereans, a protist clade closely related to choanoflagellates, Src is active, but Csk is apparently inactive. In this study, we use a combination of bioinformatics, in vitro kinase assays, and yeast-based growth assays to characterize holozoan Src and Csk orthologs. We show that, despite appreciable differences in domain architecture, Csk from Corallochytrium limacisporum, a highly diverged holozoan marine protist, is active and can inhibit Src. However, in comparison with other Csk orthologs, Corallochytrium Csk displays broad substrate specificity and inhibits Src in an activity-independent manner. Furthermore, in contrast to previous studies, we show that Csk from the filasterean Capsaspora owczarzaki is active and that the Csk-Src negative regulatory mechanism is present in Csk and Src proteins from C. owczarzaki and the choanoflagellate Monosiga brevicollis Our results suggest that negative regulation of Src by Csk is more ancient than previously thought and that it might be conserved across all holozoan species.
Collapse
|
12
|
The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets. PLoS Comput Biol 2014; 10:e1003946. [PMID: 25473967 PMCID: PMC4256021 DOI: 10.1371/journal.pcbi.1003946] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Accepted: 09/26/2014] [Indexed: 11/19/2022] Open
Abstract
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.
Collapse
|
13
|
First Report of Diaporthe novem Causing Postharvest Rot of Kiwifruit During Controlled Atmosphere Storage in Chile. PLANT DISEASE 2014; 98:1274. [PMID: 30699641 DOI: 10.1094/pdis-02-14-0183-pdn] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Chile is considered the third major exporter of kiwifruits (Actinidia deliciosa (A. Chev.) C. F. Liang & A. R. Ferguson) worldwide after Italy and New Zealand (1). The genus Diaporthe Nitschke (anamorph: genus Phomopsis) has been reported as causing postharvest rot in kiwifruit (4). During the current study, 1,400 fruits arbitrarily collected from seven controlled atmosphere (CA) rooms after 90 days of storage conditions (2% O2, 5% CO2) determined that 21.5% of the fruit were affected by decay and 0.86% developed symptoms different than those caused by Botrytis cinerea, the main postharvest pathogen associated to kiwifruit. Symptoms were soft rot with brown skin that started at the stem-end and in severe cases affected the entire fruit. Internally, affected fruit showed browning and watery tissues. Twelve affected fruits were surface disinfested (75% ethanol) and small pieces of internal rotten tissues were placed on acidified potato dextrose agar (APDA) for 7 days at 20°C. Twelve isolates were obtained, and four of them were identified morphologically and molecularly as Diaporthe ambigua, a species that has been previously described causing rot in stored kiwifruits in Chile (2). However, eight other flat, white to grayish colonies with sparse dirty-white aerial mycelium at the edge of the dish were obtained (3). Black pycnidia contained unicellular, hyaline, biguttulate, oval to cylindrical alpha conidia, with obtuse ends of (7.9) 6.7 (5.3) × (2.9) 2.5 (2.1) μm (n = 30). These isolates were tentatively identified as a Diaporthe sp. The species identification was determined by sequencing comparison of the internal transcribed spacer (ITS1-5.8S-ITS2) region of the rDNA (GenBank Accession Nos. KJ210020 to 24, KJ210027, and KJ210033) and a portion of beta-tubulin (BT) (KJ210034 to 38, KJ210041, and KJ210047) using primers ITS4-ITS5 and Bt2a-Bt2b, respectively. BLAST analyses showed 99 to 100% identity with D. novem J.M. Santos, Vrandecic & A.J.L Phillips reference ex-type (KC343156 and KC344124 for ITS and BT, respectively) (3). Eighteen mature kiwifruits cv. Hayward were inoculated using a sterile cork borer on the surface of the fruit and placing 5-mm agar plugs with mycelial of D. novem (DN-1-KF). An equal number of fruits treated with sterile agar plugs were used as negative controls. After 30 days at 0°C under CA, all inoculated fruit showed rot symptoms with lesions 7.8 to 16.4 mm in diameter. The same D. novem isolate was inoculated with 30 μl of a conidial suspension (106 conidia/ml) on the surface of 18 ripe kiwifruits that were previously wounded and non-wounded as described above. An equal number of wounded and non-wounded fruits, treated with 30 μl sterile water, were used as negative controls. All inoculated wounded fruits developed rot symptoms with necrotic lesions of 14.1 to 20.2 mm of diameter after 14 days at 25°C. Inoculated non-wounded and negative control fruits remained symptomless. Koch's postulates were fulfilled by re-isolating D. novem only from the symptomatic fruits. To our knowledge, this is the first report of rot caused by D. novem on kiwifruit during cold storage in Chile and worldwide. Therefore, both Diaporthe species appears to be associated to Diaporthe rot of kiwifruit in Chile. References: (1) Belrose, Inc. World Kiwifruit Review. Belrose, Inc. Publishers, Pullman, WA, 2012. (2) J. Auger et al. Plant Dis. 97:843, 2013. (3) R. Gomes et al. Persoonia 31:1, 2013. (4) L. Luongo et al. J. Plant Pathol. 93:205, 2011.
Collapse
|
14
|
Growth temperature and genome size in bacteria are negatively correlated, suggesting genomic streamlining during thermal adaptation. Genome Biol Evol 2013; 5:966-77. [PMID: 23563968 PMCID: PMC3673621 DOI: 10.1093/gbe/evt050] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Prokaryotic genomes are small and compact. Either this feature is caused by neutral evolution or by natural selection favoring small genomes—genome streamlining. Three separate prior lines of evidence argue against streamlining for most prokaryotes. We find that the same three lines of evidence argue for streamlining in the genomes of thermophile bacteria. Specifically, with increasing habitat temperature and decreasing genome size, the proportion of genomic DNA in intergenic regions decreases. Furthermore, with increasing habitat temperature, generation time decreases. Genome-wide selective constraints do not decrease as in the reduced genomes of host-associated species. Reduced habitat variability is not a likely explanation for the smaller genomes of thermophiles. Genome size may be an indirect target of selection due to its association with cell volume. We use metabolic modeling to demonstrate that known changes in cell structure and physiology at high temperature can provide a selective advantage to reduce cell volume at high temperatures.
Collapse
|
15
|
A comparison of genotype-phenotype maps for RNA and proteins. Biophys J 2012; 102:1916-25. [PMID: 22768948 DOI: 10.1016/j.bpj.2012.01.047] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Revised: 01/19/2012] [Accepted: 01/27/2012] [Indexed: 02/04/2023] Open
Abstract
The relationship between the genotype (sequence) and the phenotype (structure) of macromolecules affects their ability to evolve new structures and functions. We here compare the genotype space organization of proteins and RNA molecules to identify differences that may affect this ability. To this end, we computationally study the genotype-phenotype relationship for short RNA and lattice proteins of a reduced monomer alphabet size, to make exhaustive analysis and direct comparison of their genotype spaces feasible. We find that many fewer protein molecules than RNA molecules fold, but they fold into many more structures than RNA. In consequence, protein phenotypes have smaller genotype networks whose member genotypes tend to be more similar than for RNA phenotypes. Neighborhoods in sequence space of a given radius around an RNA molecule contain more novel structures than for protein molecules. We compare this property to evidence from natural RNA and protein molecules, and conclude that RNA genotype space may be more conducive to the evolution of new structure phenotypes.
Collapse
|
16
|
Evolutionary innovations and the organization of protein functions in genotype space. PLoS One 2010; 5:e14172. [PMID: 21152394 PMCID: PMC2994758 DOI: 10.1371/journal.pone.0014172] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Accepted: 10/28/2010] [Indexed: 11/18/2022] Open
Abstract
The organization of protein structures in protein genotype space is well studied. The same does not hold for protein functions, whose organization is important to understand how novel protein functions can arise through blind evolutionary searches of sequence space. In systems other than proteins, two organizational features of genotype space facilitate phenotypic innovation. The first is that genotypes with the same phenotype form vast and connected genotype networks. The second is that different neighborhoods in this space contain different novel phenotypes. We here characterize the organization of enzymatic functions in protein genotype space, using a data set of more than 30,000 proteins with known structure and function. We show that different neighborhoods of genotype space contain proteins with very different functions. This property both facilitates evolutionary innovation through exploration of a genotype network, and it constrains the evolution of novel phenotypes. The phenotypic diversity of different neighborhoods is caused by the fact that some functions can be carried out by multiple structures. We show that the space of protein functions is not homogeneous, and different genotype neighborhoods tend to contain a different spectrum of functions, whose diversity increases with increasing distance of these neighborhoods in sequence space. Whether a protein with a given function can evolve specific new functions is thus determined by the protein's location in sequence space.
Collapse
|
17
|
Abstract
Empirical or knowledge-based potentials have many applications in structural biology such as the prediction of protein structure, protein-protein, and protein-ligand interactions and in the evaluation of stability for mutant proteins, the assessment of errors in experimentally solved structures, and the design of new proteins. Here, we describe a simple procedure to derive and use pairwise distance-dependent potentials that rely on the definition of effective atomic interactions, which attempt to capture interactions that are more likely to be physically relevant. Based on a difficult benchmark test composed of proteins with different secondary structure composition and representing many different folds, we show that the use of effective atomic interactions significantly improves the performance of potentials at discriminating between native and near-native conformations. We also found that, in agreement with previous reports, the potentials derived from the observed effective atomic interactions in native protein structures contain a larger amount of mutual information. A detailed analysis of the effective energy functions shows that atom connectivity effects, which mostly arise when deriving the potential by the incorporation of those indirect atomic interactions occurring beyond the first atomic shell, are clearly filtered out. The shape of the energy functions for direct atomic interactions representing hydrogen bonding and disulfide and salt bridges formation is almost unaffected when effective interactions are taken into account. On the contrary, the shape of the energy functions for indirect atom interactions (i.e., those describing the interaction between two atoms bound to a direct interacting pair) is clearly different when effective interactions are considered. Effective energy functions for indirect interacting atom pairs are not influenced by the shape or the energy minimum observed for the corresponding direct interacting atom pair. Our results suggest that the dependency between the signals in different energy functions is a key aspect that need to be addressed when empirical energy functions are derived and used, and also highlight the importance of additivity assumptions in the use of potential energy functions.
Collapse
|
18
|
Protein robustness promotes evolutionary innovations on large evolutionary time-scales. Proc Biol Sci 2008; 275:1595-602. [PMID: 18430649 DOI: 10.1098/rspb.2007.1617] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent laboratory experiments suggest that a molecule's ability to evolve neutrally is important for its ability to generate evolutionary innovations. In contrast to laboratory experiments, life unfolds on time-scales of billions of years. Here, we ask whether a molecule's ability to evolve neutrally-a measure of its robustness-facilitates evolutionary innovation also on these large time-scales. To this end, we use protein designability, the number of sequences that can adopt a given protein structure, as an estimate of the structure's ability to evolve neutrally. Based on two complementary measures of functional diversity-catalytic diversity and molecular functional diversity in gene ontology-we show that more robust proteins have a greater capacity to produce functional innovations. Significant associations among structural designability, folding rate and intrinsic disorder also exist, underlining the complex relationship of the structural factors that affect protein evolution.
Collapse
|
19
|
StAR: a simple tool for the statistical comparison of ROC curves. BMC Bioinformatics 2008; 9:265. [PMID: 18534022 PMCID: PMC2435548 DOI: 10.1186/1471-2105-9-265] [Citation(s) in RCA: 140] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Accepted: 06/05/2008] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND As in many different areas of science and technology, most important problems in bioinformatics rely on the proper development and assessment of binary classifiers. A generalized assessment of the performance of binary classifiers is typically carried out through the analysis of their receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) constitutes a popular indicator of the performance of a binary classifier. However, the assessment of the statistical significance of the difference between any two classifiers based on this measure is not a straightforward task, since not many freely available tools exist. Most existing software is either not free, difficult to use or not easy to automate when a comparative assessment of the performance of many binary classifiers is intended. This constitutes the typical scenario for the optimization of parameters when developing new classifiers and also for their performance validation through the comparison to previous art. RESULTS In this work we describe and release new software to assess the statistical significance of the observed difference between the AUCs of any two classifiers for a common task estimated from paired data or unpaired balanced data. The software is able to perform a pairwise comparison of many classifiers in a single run, without requiring any expert or advanced knowledge to use it. The software relies on a non-parametric test for the difference of the AUCs that accounts for the correlation of the ROC curves. The results are displayed graphically and can be easily customized by the user. A human-readable report is generated and the complete data resulting from the analysis are also available for download, which can be used for further analysis with other software. The software is released as a web server that can be used in any client platform and also as a standalone application for the Linux operating system. CONCLUSION A new software for the statistical comparison of ROC curves is released here as a web server and also as standalone software for the LINUX operating system.
Collapse
|
20
|
A knowledge-based potential with an accurate description of local interactions improves discrimination between native and near-native protein conformations. Cell Biochem Biophys 2007; 49:111-24. [PMID: 17906366 DOI: 10.1007/s12013-007-0050-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Revised: 11/30/1999] [Accepted: 07/16/2007] [Indexed: 10/22/2022]
Abstract
The correct discrimination between native and near-native protein conformations is essential for achieving accurate computer-based protein structure prediction. However, this has proven to be a difficult task, since currently available physical energy functions, empirical potentials and statistical scoring functions are still limited in achieving this goal consistently. In this work, we assess and compare the ability of different full atom knowledge-based potentials to discriminate between native protein structures and near-native protein conformations generated by comparative modeling. Using a benchmark of 152 near-native protein models and their corresponding native structures that encompass several different folds, we demonstrate that the incorporation of close non-bonded pairwise atom terms improves the discriminating power of the empirical potentials. Since the direct and unbiased derivation of close non-bonded terms from current experimental data is not possible, we obtained and used those terms from the corresponding pseudo-energy functions of a non-local knowledge-based potential. It is shown that this methodology significantly improves the discrimination between native and near-native protein conformations, suggesting that a proper description of close non-bonded terms is important to achieve a more complete and accurate description of native protein conformations. Some external knowledge-based energy functions that are widely used in model assessment performed poorly, indicating that the benchmark of models and the specific discrimination task tested in this work constitutes a difficult challenge.
Collapse
|
21
|
Nonbonded terms extrapolated from nonlocal knowledge-based energy functions improve error detection in near-native protein structure models. Protein Sci 2007; 16:1410-21. [PMID: 17586774 PMCID: PMC2206707 DOI: 10.1110/ps.062735907] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The accurate assessment of structural errors plays a key role in protein structure prediction, constitutes the first step of protein structure refinement, and has a major impact on subsequent functional inference from structural data. In this study, we assess and compare the ability of different full atom knowledge-based potentials to detect small and localized errors in comparative protein structure models of known accuracy. We have evaluated the effect of incorporating close nonbonded pairwise atom terms on the task of classifying residue modeling accuracy. Since the direct and unbiased derivation of close nonbonded terms from current experimental data is not possible, we extrapolated those terms from the corresponding pseudo-energy functions of a nonlocal knowledge-based potential. It is shown that this methodology clearly improves the detection of errors in protein models, suggesting that a proper description of close nonbonded terms is important to achieve a more complete and accurate description of native protein conformations. The use of close nonbonded terms directly derived from experimental data exhibited a poor performance, demonstrating that these terms cannot be accurately obtained by using the current data and methodology. Some external knowledge-based energy functions that are widely used in model assessment also performed poorly, which suggests that the benchmark of models and the specific error detection task tested in this study constituted a difficult challenge. The methodology presented here could be useful to detect localized structural errors not only in high-quality protein models, but also in experimental protein structures.
Collapse
|
22
|
Stoichiometry and conditional stability constants of Cu(II) or Zn(II) clioquinol complexes; implications for Alzheimer's and Huntington's disease therapy. Neurotoxicology 2007; 28:445-9. [PMID: 17382398 DOI: 10.1016/j.neuro.2007.02.004] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2006] [Revised: 02/04/2007] [Accepted: 02/05/2007] [Indexed: 11/21/2022]
Abstract
Successful trials with 5-chloro-7-iodo-8-hydroxyquinoline (clioquinol, CQ) for Alzheimer's disease treatment prompted renewed interest in assessing whether its therapeutic action is related to the coordination of neurotoxic trace metals, such as Cu(II) and Zn(II). We now report conditional stability constants (K(C')) for CQ Cu(II) and Zn(II) complexes measured in a biological buffer containing Ca(II) and Mg(II) ions. UV-vis spectroscopy and polarography evidenced a 1:2 stoichiometry of Cu(II) and Zn(II) CQ complexes; the K(C')s calculated were: Cu(CQ)(2) 1.2x10(10), and Zn(CQ)(2) 7.0x10(8)M(-2); the CQ affinity for Cu(II) is at least an order of magnitude higher than for Zn(II). To test the possible functional relevance of the Cu(II) CQ complexes in the brain, we bioassayed free Cu(II) concentration by the metal-induced inhibition of ATP-gated currents of the P2X(4) receptor, a predominant brain P2X receptor. CQ reduced concentration-dependently the Cu(II) inhibition of the ATP-gated currents. In view that the stability constant of CQ for Zn(II) is similar to that of Abeta-amyloid for Zn(II), and the fact that CQ may form complexes with Cu(II), even in the presence of competing ions, the present results highlight that the formation of Cu(II) CQ complexes in the brain may act by diminishing free Cu(II) concentrations modifying thereby brain excitability, or favoring the degradation of beta-amyloid plaques or huntingtin, rather than through a specific effect of CQ itself.
Collapse
|