1
|
Spicer R, Salek RM, Moreno P, Cañueto D, Steinbeck C. Navigating freely-available software tools for metabolomics analysis. Metabolomics 2017; 13:106. [PMID: 28890673 PMCID: PMC5550549 DOI: 10.1007/s11306-017-1242-7] [Citation(s) in RCA: 150] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 07/25/2017] [Indexed: 12/21/2022]
Abstract
INTRODUCTION The field of metabolomics has expanded greatly over the past two decades, both as an experimental science with applications in many areas, as well as in regards to data standards and bioinformatics software tools. The diversity of experimental designs and instrumental technologies used for metabolomics has led to the need for distinct data analysis methods and the development of many software tools. OBJECTIVES To compile a comprehensive list of the most widely used freely available software and tools that are used primarily in metabolomics. METHODS The most widely used tools were selected for inclusion in the review by either ≥ 50 citations on Web of Science (as of 08/09/16) or the use of the tool being reported in the recent Metabolomics Society survey. Tools were then categorised by the type of instrumental data (i.e. LC-MS, GC-MS or NMR) and the functionality (i.e. pre- and post-processing, statistical analysis, workflow and other functions) they are designed for. RESULTS A comprehensive list of the most used tools was compiled. Each tool is discussed within the context of its application domain and in relation to comparable tools of the same domain. An extended list including additional tools is available at https://github.com/RASpicer/MetabolomicsTools which is classified and searchable via a simple controlled vocabulary. CONCLUSION This review presents the most widely used tools for metabolomics analysis, categorised based on their main functionality. As future work, we suggest a direct comparison of tools' abilities to perform specific data analysis tasks e.g. peak picking.
Collapse
Affiliation(s)
- Rachel Spicer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Reza M. Salek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Daniel Cañueto
- Metabolomics Platform, IISPV, DEEEA, Universitat Rovira i Virgili, Campus Sescelades, Carretera de Valls, s/n, 43007 Tarragona, Catalonia Spain
| | - Christoph Steinbeck
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- Friedrich-Schiller-University Jena, Lessingstr. 8, Jena, 07743 Germany
| |
Collapse
|
2
|
Mohammad Y, Nishida T. Learning interaction protocols by mimicking understanding and reproducing human interactive behavior. Pattern Recognit Lett 2015. [DOI: 10.1016/j.patrec.2014.11.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
3
|
|
4
|
Shift density estimation based approximately recurring motif discovery. APPL INTELL 2015. [DOI: 10.1007/s10489-014-0531-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
5
|
Leibovich L, Yakhini Z. Efficient motif search in ranked lists and applications to variable gap motifs. Nucleic Acids Res 2012; 40:5832-47. [PMID: 22416066 PMCID: PMC3401424 DOI: 10.1093/nar/gks206] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.
Collapse
Affiliation(s)
- Limor Leibovich
- Department of Computer Science, Technion-Israel Institute of Technology, Haifa, 32000, Israel
| | | |
Collapse
|
6
|
Jia C, Lu R, Chen L. A Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences. INT J COMPUT INT SYS 2011. [DOI: 10.1080/18756891.2011.9727851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
7
|
|
8
|
Ho ES, Jakubowski CD, Gunderson SI. iTriplet, a rule-based nucleic acid sequence motif finder. Algorithms Mol Biol 2009; 4:14. [PMID: 19874606 PMCID: PMC2784457 DOI: 10.1186/1748-7188-4-14] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2009] [Accepted: 10/29/2009] [Indexed: 12/29/2022] Open
Abstract
Background With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides) motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. Results We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. Conclusion iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.
Collapse
|
9
|
|
10
|
Zhang K, Fan W, Deininger P, Edwards A, Xu Z, Zhu D. Breaking the computational barrier: a divide-conquer and aggregate based approach for Alu insertion site characterisation. ACTA ACUST UNITED AC 2009; 2:302-22. [PMID: 20090173 DOI: 10.1504/ijcbdd.2009.030763] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Insertion site characterisation of Alu elements is an important problem in primate-specific bioinformatics research. Key characteristics of this challenging problem include: data are not in the pre-defined feature vectors for predictive model construction; without any prior knowledge, can we discover the general patterns that could exist and also make biological insights?; how to obtain the compact yet discriminative patterns given a search space of 4(200)? This paper provides an integrated algorithmic framework for fulfilling the above mining tasks. Compared to the benchmark biological study, our results provide a further refined analysis of the patterns involved in Alu insertion. In particular, we acquire a 200nt predictive profile around the primary insertion site which not only contains the widely accepted consensus, but also suggests a longer pattern (T(7)AA[G'A]AATAA. This pattern provides more insight into the favourable sequence variations allowed for preferred binding and cleavage by the L1 ORF2 endonuclease. The proposed method is general enough that can be also applied to other sequence detection problems, such as microRNA target prediction.
Collapse
Affiliation(s)
- Kun Zhang
- Department of Computer Science, Xavier University of Louisiana, New Orleans, Louisiana 70125, USA.
| | | | | | | | | | | |
Collapse
|
11
|
Buza K, Schmidt-Thieme L. Motif-Based Classification of Time Series with Bayesian Networks and SVMs. ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE 2009. [DOI: 10.1007/978-3-642-01044-6_9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
12
|
Dieterich C, Sommer RJ. A Caenorhabditis motif compendium for studying transcriptional gene regulation. BMC Genomics 2008; 9:30. [PMID: 18215260 PMCID: PMC2248174 DOI: 10.1186/1471-2164-9-30] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2007] [Accepted: 01/23/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Controlling gene expression is fundamental to biological complexity. The nematode Caenorhabditis elegans is an important model for studying principles of gene regulation in multi-cellular organisms. A comprehensive parts list of putative regulatory motifs was yet missing for this model system. In this study, we compile a set of putative regulatory motifs by combining evidence from conservation and expression data. DESCRIPTION We present an unbiased comparative approach to a regulatory motif compendium for Caenorhabditis species. This involves the assembly of a new nematode genome, whole genome alignments and assessment of conserved k-mers counts. Candidate motifs are selected from a set of 9,500 randomly picked genes by three different motif discovery strategies. Motif candidates have to pass a conservation enrichment filter. Motif degeneracy and length are optimized. Retained motif descriptions are evaluated by expression data using a non-parametric test, which assesses expression changes due to the presence/absence of individual motifs. Finally, we also provide condition-specific motif ensembles by conditional tree analysis. CONCLUSION The nematode genomes align surprisingly well despite high neutral substitution rates. Our pipeline delivers motif sets by three alternative strategies. Each set contains less than 400 motifs, which are significantly conserved and correlated with 214 out of 270 tested gene expression conditions. This motif compendium is an entry point to comprehensive studies on nematode gene regulation. The website: http://corg.eb.tuebingen.mpg.de/CMC has extensive query capabilities, supplements this article and supports the experimental list.
Collapse
Affiliation(s)
- Christoph Dieterich
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Spemannstrasse 35 - 37, Tübingen, Germany.
| | | |
Collapse
|
13
|
Wei W, Yu XD. Comparative analysis of regulatory motif discovery tools for transcription factor binding sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2007; 5:131-42. [PMID: 17893078 PMCID: PMC5054109 DOI: 10.1016/s1672-0229(07)60023-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, has emerged as an obstacle that frustrates many researchers. Consequently, numerous motif discovery tools and correlated databases have been applied to solving this problem. However, these existing methods, based on different computational algorithms, show diverse motif prediction efficiency in non-coding DNA sequences. Therefore, understanding the similarities and differences of computational algorithms and enriching the motif discovery literatures are important for users to choose the most appropriate one among the online available tools. Moreover, there still lacks credible criterion to assess motif discovery tools and instructions for researchers to choose the best according to their own projects. Thus integration of the related resources might be a good approach to improve accuracy of the application. Recent studies integrate regulatory motif discovery tools with experimental methods to offer a complementary approach for researchers, and also provide a much-needed model for current researches on transcriptional regulatory networks. Here we present a comparative analysis of regulatory motif discovery tools for TFBSs.
Collapse
|
14
|
Deepak SA, Kottapalli KR, Rakwal R, Oros G, Rangappa KS, Iwahashi H, Masuo Y, Agrawal GK. Real-Time PCR: Revolutionizing Detection and Expression Analysis of Genes. Curr Genomics 2007; 8:234-51. [PMID: 18645596 PMCID: PMC2430684 DOI: 10.2174/138920207781386960] [Citation(s) in RCA: 127] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2007] [Revised: 02/27/2007] [Accepted: 03/02/2007] [Indexed: 02/06/2023] Open
Abstract
Invention of polymerase chain reaction (PCR) technology by Kary Mullis in 1984 gave birth to real-time PCR. Real-time PCR - detection and expression analysis of gene(s) in real-time - has revolutionized the 21(st) century biological science due to its tremendous application in quantitative genotyping, genetic variation of inter and intra organisms, early diagnosis of disease, forensic, to name a few. We comprehensively review various aspects of real-time PCR, including technological refinement and application in all scientific fields ranging from medical to environmental issues, and to plant.
Collapse
Affiliation(s)
- SA Deepak
- Department of Studies in Applied Botany and Biotechnology, University of Mysore, Manasagangotri, Mysore 570006,
India
| | - KR Kottapalli
- Plant Genome Research Unit, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba 305-
8602, Ibaraki, Japan
| | - R Rakwal
- Human Stress Signal Research Center (HSS), National Institute of Advanced Industrial Science
and Technology (AIST), Tsukuba West, 16-1 Onogawa, Tsukuba 305-8569, Ibaraki, Japan
- Research Laboratory for
Agricultural Biotechnology and Biochemistry (RLABB), GPO Box 8207, Kathmandu, Nepal
| | - G Oros
- Plant Protection Institute,
Hungarian Academy of Sciences, Budapest, Hungary
| | - KS Rangappa
- Department of Studies in Chemistry, University of Mysore,
Manasagangotri, Mysore 570006, India
| | - H Iwahashi
- Human Stress Signal Research Center (HSS), National Institute of Advanced Industrial Science
and Technology (AIST), Tsukuba West, 16-1 Onogawa, Tsukuba 305-8569, Ibaraki, Japan
| | - Y Masuo
- Human Stress Signal Research Center (HSS), National Institute of Advanced Industrial Science
and Technology (AIST), Tsukuba West, 16-1 Onogawa, Tsukuba 305-8569, Ibaraki, Japan
| | - GK Agrawal
- Research Laboratory for
Agricultural Biotechnology and Biochemistry (RLABB), GPO Box 8207, Kathmandu, Nepal
| |
Collapse
|
15
|
Wijaya E, Rajaraman K, Yiu SM, Sung WK. Detection of generic spaced motifs using submotif pattern mining. ACTA ACUST UNITED AC 2007; 23:1476-85. [PMID: 17483509 DOI: 10.1093/bioinformatics/btm118] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
MOTIVATION Identification of motifs is one of the critical stages in studying the regulatory interactions of genes. Motifs can have complicated patterns. In particular, spaced motifs, an important class of motifs, consist of several short segments separated by spacers of different lengths. Locating spaced motifs is not trivial. Existing motif-finding algorithms are either designed for monad motifs (short contiguous patterns with some mismatches) or have assumptions on the spacer lengths or can only handle at most two segments. An effective motif finder for generic spaced motifs is highly desirable. RESULTS This article proposes a novel approach for identifying spaced motifs with any number of spacers of different lengths. We introduce the notion of submotifs to capture the segments in the spaced motif and formulate the motif-finding problem as a frequent submotif mining problem. We provide an algorithm called SPACE to solve the problem. Based on experiments on real biological datasets, synthetic datasets and the motif assessment benchmarks by Tompa et al., we show that our algorithm performs better than existing tools for spaced motifs with improvements in both sensitivity and specificity and for monads, SPACE performs as good as other tools. AVAILABILITY The source code is available upon request from the authors.
Collapse
|
16
|
Minnen D, Starner T, Essa I, Isbell C. Discovering Characteristic Actions from On-Body Sensor Data. ACTA ACUST UNITED AC 2006. [DOI: 10.1109/iswc.2006.286337] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|