1
|
Jia K, Kilinc M, Jernigan RL. New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions. FRONTIERS IN BIOINFORMATICS 2023; 3:1227193. [PMID: 37900964 PMCID: PMC10602800 DOI: 10.3389/fbinf.2023.1227193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 10/31/2023] Open
Abstract
Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.
Collapse
Affiliation(s)
- Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
| | - Mesih Kilinc
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Robert L. Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
2
|
Bayly-Jones C, Whisstock JC. Mining folded proteomes in the era of accurate structure prediction. PLoS Comput Biol 2022; 18:e1009930. [PMID: 35333855 PMCID: PMC8986115 DOI: 10.1371/journal.pcbi.1009930] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 04/06/2022] [Accepted: 02/16/2022] [Indexed: 01/02/2023] Open
Abstract
Protein structure fundamentally underpins the function and processes of numerous biological systems. Fold recognition algorithms offer a sensitive and robust tool to detect structural, and thereby functional, similarities between distantly related homologs. In the era of accurate structure prediction owing to advances in machine learning techniques and a wealth of experimentally determined structures, previously curated sequence databases have become a rich source of biological information. Here, we use bioinformatic fold recognition algorithms to scan the entire AlphaFold structure database to identify novel protein family members, infer function and group predicted protein structures. As an example of the utility of this approach, we identify novel, previously unknown members of various pore-forming protein families, including MACPFs, GSDMs and aerolysin-like proteins.
Collapse
Affiliation(s)
- Charles Bayly-Jones
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Australia
- Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Australia
| | - James C. Whisstock
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Australia
- Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Australia
| |
Collapse
|
3
|
Primetis E, Chavlis S, Pavlidis P. Evolutionary models of amino acid substitutions based on the tertiary structure of their neighborhoods. Proteins 2021; 89:1565-1576. [PMID: 34278605 DOI: 10.1002/prot.26178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 05/27/2021] [Accepted: 07/11/2021] [Indexed: 11/10/2022]
Abstract
Intra-protein residual vicinities depend on the involved amino acids. Energetically favorable vicinities (or interactions) have been preserved during evolution, while unfavorable vicinities have been eliminated. We describe, statistically, the interactions between amino acids using resolved protein structures. Based on the frequency of amino acid interactions, we have devised an amino acid substitution model that implements the following idea: amino acids that have similar neighbors in the protein tertiary structure can replace each other, while substitution is more difficult between amino acids that prefer different spatial neighbors. Using known tertiary structures for α-helical membrane (HM) proteins, we build evolutionary substitution matrices. We constructed maximum likelihood phylogenies using our amino acid substitution matrices and compared them to widely-used methods. Our results suggest that amino acid substitutions are associated with the spatial neighborhoods of amino acid residuals, providing, therefore, insights into the amino acid substitution process.
Collapse
Affiliation(s)
- Elias Primetis
- Department of Biology, University of Crete, Heraklion, Greece.,Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Hellas, Heraklion, Greece
| | - Spyridon Chavlis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Hellas, Heraklion, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology, Hellas, Heraklion, Greece
| |
Collapse
|
4
|
Jia K, Jernigan RL. New amino acid substitution matrix brings sequence alignments into agreement with structure matches. Proteins 2021; 89:671-682. [PMID: 33469973 PMCID: PMC8641535 DOI: 10.1002/prot.26050] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/08/2021] [Accepted: 01/12/2021] [Indexed: 12/27/2022]
Abstract
Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small-large pair changing to a large-small pair substitutions that are not individually so conservative. Here we show that including information for such pairs of substitutions yields improved sequence matches, and that these yield significant gains in the agreements between sequence alignments and structure matches of the same protein pair. The result shows sequence segments matched where structure segments are aligned. There are gains for all 2002 collected cases where the sequence alignments that were not previously congruent with the structure matches. Our results also demonstrate a significant gain in detecting homology for “twilight zone” protein sequences. The amino acid substitution metrics derived have many other potential applications, for annotations, protein design, mutagenesis design, and empirical potential derivation.
Collapse
Affiliation(s)
- Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, USA
| | - Robert L Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
5
|
Yang L, Wei P, Zhong C, Meng Z, Wang P, Tang YY. A Fractal Dimension and Empirical Mode Decomposition-Based Method for Protein Sequence Analysis. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001419400202] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In bioinformatics, the biological functions of proteins and their interactions can often be analyzed by the similarity of their sequences. In this paper, the authors combine the fractal dimension, empirical mode decomposition (EMD), and sliding window for protein sequence comparison. First, the protein sequence is characterized and digitized into a signal, and then the signal characteristics are obtained by using EMD and fractal dimension. Each protein sequence can be decomposed into Intrinsic Mode Functions (IMFs). The fixed window’s fractal dimension is applied to each IMF and the original signal to extract the protein sequence characteristics. Experiments have shown that the feature extracted by this hybrid method is superior to the EMD method alone.
Collapse
Affiliation(s)
- Lina Yang
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Pu Wei
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Zuqiang Meng
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Patrick Wang
- Computer and Information Science, Northeastern University, Boston, USA
| | - Yuan Yan Tang
- Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC), Beihang University, Beijing, P. R. China
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, P. R. China
| |
Collapse
|