Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yamada K, Tomii K. Revisiting amino acid substitution matrices for identifying distantly related proteins. ACTA ACUST UNITED AC 2013;30:317-25. [PMID: 24281694 PMCID: PMC3904525 DOI: 10.1093/bioinformatics/btt694] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

For:	Yamada K, Tomii K. Revisiting amino acid substitution matrices for identifying distantly related proteins. ACTA ACUST UNITED AC 2013;30:317-25. [PMID: 24281694 PMCID: PMC3904525 DOI: 10.1093/bioinformatics/btt694] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Number

Cited by Other Article(s)

Mbogo I, Kawano C, Nakamura R, Tsuchiya Y, Villar-Briones A, Hirao Y, Yasuoka Y, Hayakawa E, Tomii K, Watanabe H. A transphyletic study of metazoan β-catenin protein complexes. ZOOLOGICAL LETTERS 2024;10:20. [PMID: 39623505 PMCID: PMC11613877 DOI: 10.1186/s40851-024-00243-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 10/22/2024] [Indexed: 12/06/2024]

Imanbayeva A, Duisenova N, Orazov A, Sagyndykova M, Belozerov I, Tuyakova A. Study of the Floristic, Morphological, and Genetic (atpF-atpH, Internal Transcribed Spacer (ITS), matK, psbK-psbI, rbcL, and trnH-psbA) Differences in Crataegus ambigua Populations in Mangistau (Kazakhstan). PLANTS (BASEL, SWITZERLAND) 2024;13:1591. [PMID: 38931023 PMCID: PMC11207986 DOI: 10.3390/plants13121591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/05/2024] [Accepted: 06/06/2024] [Indexed: 06/28/2024]

Islam S, Pantazes RJ. Developing similarity matrices for antibody-protein binding interactions. PLoS One 2023;18:e0293606. [PMID: 37883504 PMCID: PMC10602319 DOI: 10.1371/journal.pone.0293606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open

Jia K, Kilinc M, Jernigan RL. New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions. FRONTIERS IN BIOINFORMATICS 2023;3:1227193. [PMID: 37900964 PMCID: PMC10602800 DOI: 10.3389/fbinf.2023.1227193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 10/31/2023] Open

Caswell B, Summers TJ, Licup GL, Cantu DC. Mutation Space of Spatially Conserved Amino Acid Sites in Proteins. ACS OMEGA 2023;8:24302-24310. [PMID: 37457482 PMCID: PMC10339398 DOI: 10.1021/acsomega.3c01473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 06/14/2023] [Indexed: 07/18/2023]

Chang CH, Nelson WC, Jerger A, Wright AT, Egbert RG, McDermott JE. Snekmer: a scalable pipeline for protein sequence fingerprinting based on amino acid recoding. BIOINFORMATICS ADVANCES 2023;3:vbad005. [PMID: 36789294 PMCID: PMC9913046 DOI: 10.1093/bioadv/vbad005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 12/16/2022] [Accepted: 02/01/2023] [Indexed: 02/04/2023]

Aledo P, Aledo JC. Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices. Int J Mol Sci 2023;24:ijms24010796. [PMID: 36614247 PMCID: PMC9821064 DOI: 10.3390/ijms24010796] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/24/2022] [Accepted: 12/29/2022] [Indexed: 01/04/2023] Open

Sumanaweera D, Allison L, Konagurthu AS. Bridging the gaps in statistical models of protein alignment. Bioinformatics 2022;38:i229-i237. [PMID: 35758809 PMCID: PMC9235498 DOI: 10.1093/bioinformatics/btac246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Paiva VA, Mendonça MV, Silveira SA, Ascher DB, Pires DEV, Izidoro SC. GASS-Metal: identifying metal-binding sites on protein structures using genetic algorithms. Brief Bioinform 2022;23:6590153. [PMID: 35595534 DOI: 10.1093/bib/bbac178] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/18/2022] [Accepted: 04/20/2022] [Indexed: 12/12/2022] Open

Yamamori Y, Tomii K. Application of Homology Modeling by Enhanced Profile-Profile Alignment and Flexible-Fitting Simulation to Cryo-EM Based Structure Determination. Int J Mol Sci 2022;23:1977. [PMID: 35216093 PMCID: PMC8879198 DOI: 10.3390/ijms23041977] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 02/07/2022] [Accepted: 02/09/2022] [Indexed: 12/03/2022] Open

Jia K, Jernigan RL. New amino acid substitution matrix brings sequence alignments into agreement with structure matches. Proteins 2021;89:671-682. [PMID: 33469973 PMCID: PMC8641535 DOI: 10.1002/prot.26050] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/08/2021] [Accepted: 01/12/2021] [Indexed: 12/27/2022]

Saito-Nakano Y, Wahyuni R, Nakada-Tsukui K, Tomii K, Nozaki T. Rab7D small GTPase is involved in phago-, trogocytosis and cytoskeletal reorganization in the enteric protozoan Entamoeba histolytica. Cell Microbiol 2020;23:e13267. [PMID: 32975360 PMCID: PMC7757265 DOI: 10.1111/cmi.13267] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 08/21/2020] [Accepted: 09/18/2020] [Indexed: 12/12/2022]

Abstract

Rab small GTPases regulate membrane traffic between distinct cellular compartments of all eukaryotes in a tempo‐spatially specific fashion. Rab small GTPases are also involved in the regulation of cytoskeleton and signalling. Membrane traffic and cytoskeletal regulation play pivotal role in the pathogenesis of Entamoeba histolytica, which is a protozoan parasite responsible for human amebiasis. E. histolytica is unique in that its genome encodes over 100 Rab proteins, containing multiple isotypes of conserved members (e.g., Rab7) and Entamoeba‐specific subgroups (e.g., RabA, B, and X). Among them, E. histolytica Rab7 is the most diversified group consisting of nine isotypes. While it was previously demonstrated that EhRab7A and EhRab7B are involved in lysosome and phagosome biogenesis, the individual roles of other Rab7 members and their coordination remain elusive. In this study, we characterised the third member of Rab7, Rab7D, to better understand the significance of the multiplicity of Rab7 isotypes in E. histolytica. Overexpression of EhRab7D caused reduction in phagocytosis of erythrocytes, trogocytosis (meaning nibbling or chewing of a portion) of live mammalian cells, and phagosome acidification and maturation. Conversely, transcriptional gene silencing of EhRab7D gene caused opposite phenotypes in phago/trogocytosis and phagosome maturation. Furthermore, EhRab7D gene silencing caused reduction in the attachment to and the motility on the collagen‐coated surface. Image analysis showed that EhRab7D was occasionally associated with lysosomes and prephagosomal vacuoles, but not with mature phagosomes and trogosomes. Finally, in silico prediction of structural organisation of EhRab7 isotypes identified unique amino acid changes on the effector binding surface of EhRab7D. Taken together, our data suggest that EhRab7D plays coordinated counteracting roles: a inhibitory role in phago/trogocytosis and lyso/phago/trogosome biogenesis, and an stimulatory role in adherence and motility, presumably via interaction with unique effectors. Finally, we propose the model in which three EhRab7 isotypes are sequentially involved in phago/trogocytosis.

Collapse

Polyanovsky V, Lifanov A, Esipova N, Tumanyan V. The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion. BMC Bioinformatics 2020;21:294. [PMID: 32921315 PMCID: PMC7489204 DOI: 10.1186/s12859-020-03616-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 06/18/2020] [Indexed: 11/15/2022] Open

Abstract

Background

The alignment of character sequences is important in bioinformatics. The quality of this procedure is determined by the substitution matrix and parameters of the insertion-deletion penalty function. These matrices are derived from sequence alignment and thus reflect the evolutionary process. Currently, in addition to evolutionary matrices, a large number of different background matrices have been obtained. To make an optimal choice of the substitution matrix and the penalty parameters, we conducted a numerical experiment using a representative sample of existing matrices of various types and origins.

Results

We tested both the classical evolutionary matrix series (PAM, Blosum, VTML, Pfasum); structural alignment based matrices, contact energy matrix, and matrix based on the properties of the genetic code. This study presents results for two test set types: first, we simulated sequences that reflect the divergent evolution; second, we performed tests on Balibase sequences. In both cases, we obtained the dependences of the alignment quality (Accuracy, Confidence) on the evolutionary distance between sequences and the evolutionary distance to which the substitution matrices correspond. Optimization of a combination of matrices and the penalty parameters was carried out for local and global alignment on the values of penalty function parameters.

Consequently, we found that the best alignment quality is achieved with matrices corresponding to the largest evolutionary distance. These matrices prove to be universal, i.e. suitable for aligning sequences separated by both large and small evolutionary distances. We analysed the correspondence of the correlation coefficients of matrices to the alignment quality. It was found that matrices showing high quality alignment have an above average correlation value, but the converse is not true.

Conclusions

This study showed that the best alignment quality is achieved with evolutionary matrices designed for long distances: Gonnet, VTML250, PAM250, MIQS, and Pfasum050. The same property is inherent in matrices not only of evolutionary origin, but also of another background corresponding to a large evolutionary distance. Therefore, matrices based on structural data show alignment quality close enough to its value for evolutionary matrices. This agrees with the idea that the spatial structure is more conservative than the protein sequence.

Collapse

Asnicar F, Thomas AM, Beghini F, Mengoni C, Manara S, Manghi P, Zhu Q, Bolzan M, Cumbo F, May U, Sanders JG, Zolfo M, Kopylova E, Pasolli E, Knight R, Mirarab S, Huttenhower C, Segata N. Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat Commun 2020;11:2500. [PMID: 32427907 PMCID: PMC7237447 DOI: 10.1038/s41467-020-16366-7] [Citation(s) in RCA: 440] [Impact Index Per Article: 88.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Accepted: 04/27/2020] [Indexed: 01/10/2023] Open

Affiliation(s)

Francesco Asnicar Department CIBIO, University of Trento, Trento, Italy
Andrew Maltez Thomas Department CIBIO, University of Trento, Trento, Italy
Francesco Beghini Department CIBIO, University of Trento, Trento, Italy
Claudia Mengoni Department CIBIO, University of Trento, Trento, Italy
Serena Manara Department CIBIO, University of Trento, Trento, Italy
Paolo Manghi Department CIBIO, University of Trento, Trento, Italy
Qiyun Zhu Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Mattia Bolzan Department CIBIO, University of Trento, Trento, Italy PreBiomics s.r.l, Trento, Italy
Fabio Cumbo Department CIBIO, University of Trento, Trento, Italy
Uyen May Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
Jon G Sanders Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Cornell Institute for Host-Microbe Interaction and Disease, Cornell University, Ithaca, NY, USA
Moreno Zolfo Department CIBIO, University of Trento, Trento, Italy
Evguenia Kopylova Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Clarity Genomics BVBA, Sint-Michielskaai 34, 2000, Antwerpen, Belgium
Edoardo Pasolli Department CIBIO, University of Trento, Trento, Italy Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy
Rob Knight Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Siavash Mirarab Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
Curtis Huttenhower Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA The Broad Institute of MIT and Harvard, Cambridge, MA, USA
Nicola Segata Department CIBIO, University of Trento, Trento, Italy.

Collapse

Crim1^C140S mutant mice reveal the importance of cysteine 140 in the internal region 1 of CRIM1 for its physiological functions. Mamm Genome 2019;30:329-338. [PMID: 31776724 DOI: 10.1007/s00335-019-09822-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Accepted: 11/20/2019] [Indexed: 10/25/2022]

Tomii K, Santos HJ, Nozaki T. Genome-Wide Analysis of Known and Potential Tetraspanins in Entamoeba histolytica. Genes (Basel) 2019;10:genes10110885. [PMID: 31684194 PMCID: PMC6895871 DOI: 10.3390/genes10110885] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 10/25/2019] [Accepted: 10/31/2019] [Indexed: 12/12/2022] Open

Actin Cytoskeletal Reorganization Function of JRAB/MICAL-L2 Is Fine-tuned by Intramolecular Interaction between First LIM Zinc Finger and C-terminal Coiled-coil Domains. Sci Rep 2019;9:12794. [PMID: 31488862 PMCID: PMC6728388 DOI: 10.1038/s41598-019-49232-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 08/21/2019] [Indexed: 01/01/2023] Open

Yamada KD, Kinoshita K. De novo profile generation based on sequence context specificity with the long short-term memory network. BMC Bioinformatics 2018;19:272. [PMID: 30021530 PMCID: PMC6052547 DOI: 10.1186/s12859-018-2284-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 07/11/2018] [Indexed: 11/24/2022] Open

Abstract

Background

Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, such as sequence similarity searches, multiple alignments, and evolutionary analyses. Currently, many biological sequences are becoming available, and the rapidly increasing amount of sequence data emphasizes the importance of scalable generators of amino acid sequence profiles.

Results

We employed the LSTM network and developed a novel profile generator to construct profiles without any assumptions, except for input sequence context. Our method could generate better profiles than existing de novo profile generators, including CSBuild and RPS-BLAST, on the basis of profile-sequence similarity search performance with linear calculation costs against input sequence size. In addition, we analyzed the effects of the memory power of LSTM and found that LSTM had high potential power to detect long-range interactions between amino acids, as in the case of beta-strand formation, which has been a difficult problem in protein bioinformatics using sequence information.

Conclusion

We demonstrated the importance of sequence context and the feasibility of LSTM on biological sequence analyses. Our results demonstrated the effectiveness of memories in LSTM and showed that our de novo profile generator, SPBuild, achieved higher performance than that of existing methods for profile prediction of beta-strands, where long-range interactions of amino acids are important and are known to be difficult for the existing window-based prediction methods. Our findings will be useful for the development of other prediction methods related to biological sequences by machine learning methods.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2284-1) contains supplementary material, which is available to authorized users.

Collapse

Yamada KD. Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment. Algorithms Mol Biol 2018;13:5. [PMID: 29467815 PMCID: PMC5815186 DOI: 10.1186/s13015-018-0123-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 02/06/2018] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. By definition, these functions cannot capture nonlinear relationships between profiles. Therefore, we attempted to discover a novel scoring function, which was more suitable for the profile-comparison method than existing functions, using neural networks.

RESULTS

Although neural networks required derivative-of-cost functions, the problem being addressed in this study lacked them. Therefore, we implemented a novel derivative-free neural network by combining a conventional neural network with an evolutionary strategy optimization method used as a solver. Using this novel neural network system, we optimized the scoring function to align remote sequence pairs. Our results showed that the pairwise-profile aligner using the novel scoring function significantly improved both alignment sensitivity and precision relative to aligners using existing functions.

CONCLUSIONS

We developed and implemented a novel derivative-free neural network and aligner (Nepal) for optimizing sequence alignments. Nepal improved alignment quality by adapting to remote sequence alignments and increasing the expressiveness of similarity scores. Additionally, this novel scoring function can be realized using a simple matrix operation and easily incorporated into other aligners. Moreover our scoring function could potentially improve the performance of homology detection and/or multiple-sequence alignment of remote homologous sequences. The goal of the study was to provide a novel scoring function for profile alignment method and develop a novel learning system capable of addressing derivative-free problems. Our system is capable of optimizing the performance of other sophisticated methods and solving problems without derivative-of-cost functions, which do not always exist in practical problems. Our results demonstrated the usefulness of this optimization method for derivative-free problems.

Collapse

Nojoomi S, Koehl P. A weighted string kernel for protein fold recognition. BMC Bioinformatics 2017;18:378. [PMID: 28841820 PMCID: PMC5574112 DOI: 10.1186/s12859-017-1795-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 08/15/2017] [Indexed: 11/10/2022] Open

Abstract

Background

Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little similarity. We have recently proposed an alignment-free method based on the concept of string kernels, SeqKernel (Nojoomi and Koehl, BMC Bioinformatics, 2017, 18:137). In this previous study, we have shown that while Seqkernel performs better than standard alignment-based methods, its applications are potentially limited, because of biases due mostly to sequence length effects.

Methods

In this study, we propose improvements to SeqKernel that follows two directions. First, we developed a weighted version of the kernel, WSeqKernel. Second, we expand the concept of string kernels into a novel framework for deriving information on amino acids from protein sequences.

Results

Using a dataset that only contains remote homologs, we have shown that WSeqKernel performs remarkably well in fold recognition experiments. We have shown that with the appropriate weighting scheme, we can remove the length effects on the kernel values. WSeqKernel, just like any alignment-based sequence comparison method, depends on a substitution matrix. We have shown that this matrix can be optimized so that sequence similarity scores correlate well with structure similarity scores. Starting from no information on amino acid similarity, we have shown that we can derive a scoring matrix that echoes the physico-chemical properties of amino acids.

Conclusion

We have made progress in characterizing and parametrizing string kernels as alignment-based methods for comparing protein sequences, and we have shown that they provide a framework for extracting sequence information from structure.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1795-5) contains supplementary material, which is available to authorized users.

Collapse

Barlowe S, Coan HB, Youker RT. SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment. PeerJ 2017;5:e3492. [PMID: 28674656 PMCID: PMC5490468 DOI: 10.7717/peerj.3492] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 05/27/2017] [Indexed: 01/13/2023] Open

Abstract

Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment information.

Collapse

Oda T, Lim K, Tomii K. Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance. BMC Bioinformatics 2017;18:288. [PMID: 28578660 PMCID: PMC5455086 DOI: 10.1186/s12859-017-1686-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 05/15/2017] [Indexed: 11/13/2022] Open

Lim K, Yamada KD, Frith MC, Tomii K. Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix. ACTA ACUST UNITED AC 2017;17:147-154. [PMID: 28083762 PMCID: PMC5274646 DOI: 10.1007/s10969-016-9210-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 12/05/2016] [Indexed: 12/28/2022]

Deorowicz S, Debudaj-Grabysz A, Gudyś A. FAMSA: Fast and accurate multiple sequence alignment of huge protein families. Sci Rep 2016;6:33964. [PMID: 27670777 PMCID: PMC5037421 DOI: 10.1038/srep33964] [Citation(s) in RCA: 93] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 08/31/2016] [Indexed: 11/10/2022] Open

Leelananda SP, Kloczkowski A, Jernigan RL. Fold-specific sequence scoring improves protein sequence matching. BMC Bioinformatics 2016;17:328. [PMID: 27578239 PMCID: PMC5006591 DOI: 10.1186/s12859-016-1198-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 08/24/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Sequence matching is extremely important for applications throughout biology, particularly for discovering information such as functional and evolutionary relationships, and also for discriminating between unimportant and disease mutants. At present the functions of a large fraction of genes are unknown; improvements in sequence matching will improve gene annotations. Universal amino acid substitution matrices such as Blosum62 are used to measure sequence similarities and to identify distant homologues, regardless of the structure class. However, such single matrices do not take into account important structural information evident within the different topologies of proteins and treats substitutions within all protein folds identically. Others have suggested that the use of structural information can lead to significant improvements in sequence matching but this has not yet been very effective. Here we develop novel substitution matrices that include not only general sequence information but also have a topology specific component that is unique for each CATH topology. This novel feature of using a combination of sequence and structure information for each protein topology significantly improves the sequence matching scores for the sequence pairs tested. We have used a novel multi-structure alignment method for each homology level of CATH in order to extract topological information.

Results

We obtain statistically significant improved sequence matching scores for 73 % of the alpha helical test cases. On average, 61 % of the test cases showed improvements in homology detection when structure information was incorporated into the substitution matrices. On average z-scores for homology detection are improved by more than 54 % for all cases, and some individual cases have z-scores more than twice those obtained using generic matrices. Our topology specific similarity matrices also outperform other traditional similarity matrices and single matrix based structure methods. When default amino acid substitution matrix in the Psi-blast algorithm is replaced by our structure-based matrices, the structure matching is significantly improved over conventional Psi-blast. It also outperforms results obtained for the corresponding HMM profiles generated for each topology.

Conclusions

We show that by incorporating topology-specific structure information in addition to sequence information into specific amino acid substitution matrices, the sequence matching scores and homology detection are significantly improved. Our topology specific similarity matrices outperform other traditional similarity matrices, single matrix based structure methods, also show improvement over conventional Psi-blast and HMM profile based methods in sequence matching. The results support the discriminatory ability of the new amino acid similarity matrices to distinguish between distant homologs and structurally dissimilar pairs.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1198-z) contains supplementary material, which is available to authorized users.

Collapse

Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS. Methods Mol Biol 2016. [PMID: 27115635 DOI: 10.1007/978-1-4939-3572-7_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Katoh K, Standley DM. A simple method to control over-alignment in the MAFFT multiple sequence alignment program. Bioinformatics 2016;32:1933-42. [PMID: 27153688 PMCID: PMC4920119 DOI: 10.1093/bioinformatics/btw108] [Citation(s) in RCA: 360] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 02/19/2016] [Indexed: 12/17/2022] Open

Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Sheetlin S, Park Y, Frith MC, Spouge JL. ALP & FALP: C++ libraries for pairwise local alignment E-values. Bioinformatics 2015;32:304-5. [PMID: 26428291 DOI: 10.1093/bioinformatics/btv575] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 09/28/2015] [Indexed: 11/13/2022] Open

Ndhlovu A, Hazelhurst S, Durand PM. Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix. BMC Bioinformatics 2015;16:255. [PMID: 26269100 PMCID: PMC4535666 DOI: 10.1186/s12859-015-0688-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 07/29/2015] [Indexed: 11/27/2022] Open

Abstract

Background

Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB.

Results

The evolutionary rate based approach was coupled with a conventional BLOSUM substitution matrix. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. The dynamic scoring function is based on a coupled additive approach that scores aligned sites based on the level of conservation inferred from the ω values. Evaluation of the accuracy of this new implementation, BLOSUM-FIRE, using MAFFT alignment as reference alignments has shown that it is more accurate than its predecessor FIRE. Comparison of the alignment quality with widely used algorithms (MUSCLE, T-COFFEE, and CLUSTAL Omega) revealed that the BLOSUM-FIRE algorithm performs as well as conventional algorithms. Its main strength lies in that it provides greater potential for aligning divergent sequences and addresses the problem of low specificity inherent in the original FIRE algorithm. The utility of this algorithm is demonstrated using the Hepatitis B virus X (HBx) protein, a protein of unknown function, as a test case.

Conclusion

This study describes the utility of an evolutionary rate based approach coupled to the BLOSUM62 amino acid substitution matrix in inferring protein domain function. We demonstrate that such an approach is robust and performs as well as an array of conventional algorithms.

Collapse

Izidoro SC, de Melo-Minardi RC, Pappa GL. GASS: identifying enzyme active sites with genetic algorithms. ACTA ACUST UNITED AC 2014;31:864-70. [PMID: 25388152 DOI: 10.1093/bioinformatics/btu746] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Wong PS, Tanaka M, Sunaga Y, Tanaka M, Taniguchi T, Yoshino T, Tanaka T, Fujibuchi W, Aburatani S. Tracking difference in gene expression in a time-course experiment using gene set enrichment analysis. PLoS One 2014;9:e107629. [PMID: 25268590 PMCID: PMC4182424 DOI: 10.1371/journal.pone.0107629] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 08/21/2014] [Indexed: 11/19/2022] Open

Abstract

Fistulifera sp. strain JPCC DA0580 is a newly sequenced pennate diatom that is capable of simultaneously growing and accumulating lipids. This is a unique trait, not found in other related microalgae so far. It is able to accumulate between 40 to 60% of its cell weight in lipids, making it a strong candidate for the production of biofuel. To investigate this characteristic, we used RNA-Seq data gathered at four different times while Fistulifera sp. strain JPCC DA0580 was grown in oil accumulating and non-oil accumulating conditions. We then adapted gene set enrichment analysis (GSEA) to investigate the relationship between the difference in gene expression of 7,822 genes and metabolic functions in our data. We utilized information in the KEGG pathway database to create the gene sets and changed GSEA to use re-sampling so that data from the different time points could be included in the analysis. Our GSEA method identified photosynthesis, lipid synthesis and amino acid synthesis related pathways as processes that play a significant role in oil production and growth in Fistulifera sp. strain JPCC DA0580. In addition to GSEA, we visualized the results by creating a network of compounds and reactions, and plotted the expression data on top of the network. This made existing graph algorithms available to us which we then used to calculate a path that metabolizes glucose into triacylglycerol (TAG) in the smallest number of steps. By visualizing the data this way, we observed a separate up-regulation of genes at different times instead of a concerted response. We also identified two metabolic paths that used less reactions than the one shown in KEGG and showed that the reactions were up-regulated during the experiment. The combination of analysis and visualization methods successfully analyzed time-course data, identified important metabolic pathways and provided new hypotheses for further research.

Collapse