1
|
Kikuchi AKV, Tayo LL. Principal Component and Structural Element Analysis Provide Insights into the Evolutionary Divergence of Conotoxins. BIOLOGY 2022; 12:20. [PMID: 36671713 PMCID: PMC9855797 DOI: 10.3390/biology12010020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 12/08/2022] [Accepted: 12/08/2022] [Indexed: 12/24/2022]
Abstract
Predatory cone snails (Conus) developed a sophisticated neuropharmacological mechanism to capture prey, escape against other predators, and deter competitors. Their venom's remarkable specificity for various ion channels and receptors is an evolutionary feat attributable to the venom's variety of peptide components (conotoxins). However, what caused conotoxin divergence remains unclear and may be related to the role of prey shift. Principal component analysis revealed clustering events within diet subgroups indicating peptide sequence similarity patterns based on the prey they subdue. Molecular analyses using multiple sequence alignment and structural element analysis were conducted to observe the events at the molecular level that caused the subgrouping. Three distinct subgroups were identified. Results showed homologous regions and conserved residues within diet subgroups but divergent between other groups. We specified that these structural elements caused subgrouping in alpha conotoxins that may play a role in function specificity. In each diet subgroup, amino acid character, length of intervening amino acids between cysteine residues, and polypeptide length influenced subgrouping. This study provides molecular insights into the role of prey shift, specifically diet preference, in conotoxin divergence.
Collapse
Affiliation(s)
- Akira Kio V. Kikuchi
- School of Chemical, Biological, and Materials Engineering and Sciences, Mapúa University, Manila 1002, Philippines
| | - Lemmuel L. Tayo
- School of Chemical, Biological, and Materials Engineering and Sciences, Mapúa University, Manila 1002, Philippines
- School of Health Sciences, Mapúa University, Makati City 1200, Philippines
| |
Collapse
|
2
|
Analysis of slump and surge phenomenon in Chinese stock market based on sequence alignment method. Soft comput 2020. [DOI: 10.1007/s00500-020-05076-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
3
|
Laforet M, McMurrough TA, Vu M, Brown CM, Zhang K, Junop MS, Gloor GB, Edgell DR. Modifying a covarying protein-DNA interaction changes substrate preference of a site-specific endonuclease. Nucleic Acids Res 2019; 47:10830-10841. [PMID: 31602462 PMCID: PMC6847045 DOI: 10.1093/nar/gkz866] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/17/2019] [Accepted: 10/09/2019] [Indexed: 12/23/2022] Open
Abstract
Identifying and validating intermolecular covariation between proteins and their DNA-binding sites can provide insights into mechanisms that regulate selectivity and starting points for engineering new specificity. LAGLIDADG homing endonucleases (meganucleases) can be engineered to bind non-native target sites for gene-editing applications, but not all redesigns successfully reprogram specificity. To gain a global overview of residues that influence meganuclease specificity, we used information theory to identify protein-DNA covariation. Directed evolution experiments of one predicted pair, 227/+3, revealed variants with surprising shifts in I-OnuI substrate preference at the central 4 bases where cleavage occurs. Structural studies showed significant remodeling distant from the covarying position, including restructuring of an inter-hairpin loop, DNA distortions near the scissile phosphates, and new base-specific contacts. Our findings are consistent with a model whereby the functional impacts of covariation can be indirectly propagated to neighboring residues outside of direct contact range, allowing meganucleases to adapt to target site variation and indirectly expand the sequence space accessible for cleavage. We suggest that some engineered meganucleases may have unexpected cleavage profiles that were not rationally incorporated during the design process.
Collapse
Affiliation(s)
- Marc Laforet
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Thomas A McMurrough
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Michael Vu
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Christopher M Brown
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Kun Zhang
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Murray S Junop
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Gregory B Gloor
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - David R Edgell
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| |
Collapse
|
4
|
Gil N, Fiser A. Identifying functionally informative evolutionary sequence profiles. Bioinformatics 2018; 34:1278-1286. [PMID: 29211823 PMCID: PMC5905606 DOI: 10.1093/bioinformatics/btx779] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023] Open
Abstract
Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. Contact andras.fiser@einstein.yu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| |
Collapse
|
5
|
Lakhani B, Thayer KM, Hingorani MM, Beveridge DL. Evolutionary Covariance Combined with Molecular Dynamics Predicts a Framework for Allostery in the MutS DNA Mismatch Repair Protein. J Phys Chem B 2017; 121:2049-2061. [PMID: 28135092 PMCID: PMC5346969 DOI: 10.1021/acs.jpcb.6b11976] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
![]()
Mismatch
repair (MMR) is an essential, evolutionarily conserved
pathway that maintains genome stability by correcting base-pairing
errors in DNA. Here we examine the sequence and structure of MutS
MMR protein to decipher the amino acid framework underlying its two
key activities—recognizing mismatches in DNA and using ATP
to initiate repair. Statistical coupling analysis (SCA) identified
a network (sector) of coevolved amino acids in the MutS protein family.
The potential functional significance of this SCA sector was assessed
by performing molecular dynamics (MD) simulations for alanine mutants
of the top 5% of 160 residues in the distribution, and control nonsector
residues. The effects on three independent metrics were monitored:
(i) MutS domain conformational dynamics, (ii) hydrogen bonding between
MutS and DNA/ATP, and (iii) relative ATP binding free energy. Each
measure revealed that sector residues contribute more substantively
to MutS structure–function than nonsector residues. Notably,
sector mutations disrupted MutS contacts with DNA and/or ATP from
a distance via contiguous pathways and correlated motions, supporting
the idea that SCA can identify amino acid networks underlying allosteric
communication. The combined SCA/MD approach yielded novel, experimentally
testable hypotheses for unknown roles of many residues distributed
across MutS, including some implicated in Lynch cancer syndrome.
Collapse
Affiliation(s)
- Bharat Lakhani
- Molecular Biology and Biochemistry Department, ‡Molecular Biophysics Program, §Chemistry Department, and ∥Computer Science Department, Wesleyan University , Middletown, Connecticut 06459, United States
| | - Kelly M Thayer
- Molecular Biology and Biochemistry Department, ‡Molecular Biophysics Program, §Chemistry Department, and ∥Computer Science Department, Wesleyan University , Middletown, Connecticut 06459, United States
| | - Manju M Hingorani
- Molecular Biology and Biochemistry Department, ‡Molecular Biophysics Program, §Chemistry Department, and ∥Computer Science Department, Wesleyan University , Middletown, Connecticut 06459, United States
| | - David L Beveridge
- Molecular Biology and Biochemistry Department, ‡Molecular Biophysics Program, §Chemistry Department, and ∥Computer Science Department, Wesleyan University , Middletown, Connecticut 06459, United States
| |
Collapse
|
6
|
Gao H, Yu X, Dou Y, Wang J. New Measurement for Correlation of Co-evolution Relationship of Subsequences in Protein. Interdiscip Sci 2015; 7:364-72. [PMID: 26396121 DOI: 10.1007/s12539-015-0024-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Revised: 04/08/2014] [Accepted: 04/16/2014] [Indexed: 11/26/2022]
Abstract
Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's correlation coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) are used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.
Collapse
Affiliation(s)
- Hongyun Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
- Information and Engineering College, Dalian University, Dalian, 116622, China
| | - Xiaoqing Yu
- College of Sciences, Shanghai Institute of Technology, Shanghai, 201418, China
| | - Yongchao Dou
- Center for Plant Science and Innovation, School of Biological Sciences, University of Nebraska, Lincoln, NE, 68588, USA
| | - Jun Wang
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China.
| |
Collapse
|
7
|
Herman JL, Novák Á, Lyngsø R, Szabó A, Miklós I, Hein J. Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs. BMC Bioinformatics 2015; 16:108. [PMID: 25888064 PMCID: PMC4395974 DOI: 10.1186/s12859-015-0516-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 02/24/2015] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment. RESULTS In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased. CONCLUSIONS The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign .
Collapse
Affiliation(s)
- Joseph L Herman
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UK.
- Division of Mathematical Biology, National Institute of Medical Research,, The Ridgeway, London, NW7 1AA, UK.
| | - Ádám Novák
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UK.
| | - Rune Lyngsø
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UK.
| | - Adrienn Szabó
- Institute of Computer Science and Control, Hungarian Academy of Sciences, Lagymanyosi u. 11., Budapest, 1111, Hungary.
| | - István Miklós
- Institute of Computer Science and Control, Hungarian Academy of Sciences, Lagymanyosi u. 11., Budapest, 1111, Hungary.
- Department of Stochastics, Rényi Institute, Reáltanoda u. 13-15, Budapest, 1053, Hungary.
| | - Jotun Hein
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UK.
| |
Collapse
|
8
|
Gao H, Yu X, Dou Y, Wang J. New measurement for correlation of co-evolution relationship of subsequences in protein. Interdiscip Sci 2015. [PMID: 25663109 DOI: 10.1007/s12539-014-0221-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Revised: 04/08/2014] [Accepted: 04/16/2014] [Indexed: 11/24/2022]
Abstract
Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues, and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's Correlation Coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) is used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.
Collapse
Affiliation(s)
- Hongyun Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
| | | | | | | |
Collapse
|
9
|
Control of catalytic efficiency by a coevolving network of catalytic and noncatalytic residues. Proc Natl Acad Sci U S A 2014; 111:E2376-83. [PMID: 24912189 DOI: 10.1073/pnas.1322352111] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The active sites of enzymes consist of residues necessary for catalysis and structurally important noncatalytic residues that together maintain the architecture and function of the active site. Examples of evolutionary interactions between catalytic and noncatalytic residues have been difficult to define and experimentally validate due to a general intolerance of these residues to substitution. Here, using computational methods to predict coevolving residues, we identify a network of positions consisting of two catalytic metal-binding residues and two adjacent noncatalytic residues in LAGLIDADG homing endonucleases (LHEs). Distinct combinations of the four residues in the network map to distinct LHE subfamilies, with a striking distribution of the metal-binding Asp (D) and Glu (E) residues. Mutation of these four positions in three LHEs--I-LtrI, I-OnuI, and I-HjeMI--indicate that the combinations of residues tolerated are specific to each enzyme. Kinetic analyses under single-turnover conditions revealed that I-LtrI activity could be modulated over an ∼100-fold range by mutation of residues in the coevolving network. I-LtrI catalytic site variants with low activity could be rescued by compensatory mutations at adjacent noncatalytic sites that restore an optimal coevolving network and vice versa. Our results demonstrate that LHE activity is constrained by an evolutionary barrier of residues with strong context-dependent effects. Creation of optimal coevolving active-site networks is therefore an important consideration in engineering of LHEs and other enzymes.
Collapse
|
10
|
Clark GW, Ackerman SH, Tillier ER, Gatti DL. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments. BMC Bioinformatics 2014; 15:157. [PMID: 24886131 PMCID: PMC4046016 DOI: 10.1186/1471-2105-15-157] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 05/06/2014] [Indexed: 11/10/2022] Open
Abstract
Background Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties. Results In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles. Conclusions Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods.
Collapse
Affiliation(s)
| | | | - Elisabeth R Tillier
- Department of Medical Biophysics, University of Toronto, Campbell Family Institute for Cancer Research, Ontario Cancer Institute, University Health Network, Toronto, Ontario, Canada.
| | | |
Collapse
|
11
|
Abstract
Positions in a protein are thought to coevolve to maintain important structural and functional interactions over evolutionary time. The detection of putative coevolving positions can provide important new insights into a protein family in the same way that knowledge is gained by recognizing evolutionarily conserved characters and characteristics. Putatively coevolving positions can be detected with statistical methods that identify covarying positions. However, positions in protein alignments can covary for many other reasons than coevolution; thus, it is crucial to create high-quality multiple sequence alignments for coevolution inference. Furthermore, it is important to understand common signs and sources of error. When confounding factors are accounted for, coevolution is a rich resource for protein engineering information.
Collapse
|