1
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
2
|
Moreira-Filho JT, Ranganath D, Conway M, Schmitt C, Kleinstreuer N, Mansouri K. Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow. J Cheminform 2024; 16:101. [PMID: 39152469 PMCID: PMC11330086 DOI: 10.1186/s13321-024-00894-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 08/06/2024] [Indexed: 08/19/2024] Open
Abstract
With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.
Collapse
Affiliation(s)
- José T Moreira-Filho
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| | - Dhruv Ranganath
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mike Conway
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Charles Schmitt
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| |
Collapse
|
3
|
Spatola G, Giusti A, Armani A. The "Dry-Lab" Side of Food Authentication: Benchmark of Bioinformatic Pipelines for the Analysis of Metabarcoding Data. Foods 2024; 13:2102. [PMID: 38998608 PMCID: PMC11241536 DOI: 10.3390/foods13132102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/25/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024] Open
Abstract
Next Generation Sequencing Technologies (NGS), particularly metabarcoding, are valuable tools for authenticating foodstuffs and detecting eventual fraudulent practices such as species substitution. This technique, mostly used for the analysis of prokaryotes in several environments (including food), is in fact increasingly applied to identify eukaryotes (e.g., fish, mammals, avian, etc.) in multispecies food products. Besides the "wet-lab" procedures (e.g., DNA extraction, PCR, amplicon purification, etc.), the metabarcoding workflow includes a final "dry-lab" phase in which sequencing data are analyzed using a bioinformatic pipeline (BP). BPs play a crucial role in the accuracy, reliability, and interpretability of the metabarcoding results. Choosing the most suitable BP for the analysis of metabarcoding data could be challenging because it might require greater informatics skills than those needed in standard molecular analysis. To date, studies comparing BPs for metabarcoding data analysis in foodstuff authentication are scarce. In this study, we compared the data obtained from two previous studies in which fish burgers and insect-based products were authenticated using a customizable, ASV-based, and command-line interface BP (BP1) by analyzing the same data with a customizable but OTU-based and graphical user interface BP (BP2). The final sample compositions were compared statistically. No significant difference in sample compositions was highlighted by applying BP1 and BP2. However, BP1 was considered as more user-friendly than BP2 with respect to data analysis streamlining, cost of analysis, and computational time consumption. This study can provide useful information for researchers approaching the bioinformatic analysis of metabarcoding data for the first time. In the field of food authentication, an effective and efficient use of BPs could be especially useful in the context of official controls performed by the Competent Authorities and companies' self-control in order to detect species substitution and counterfeit frauds.
Collapse
Affiliation(s)
- Gabriele Spatola
- Department of Veterinary Sciences, University of Pisa, 56124 Pisa, Italy
| | - Alice Giusti
- Department of Veterinary Sciences, University of Pisa, 56124 Pisa, Italy
| | - Andrea Armani
- Department of Veterinary Sciences, University of Pisa, 56124 Pisa, Italy
| |
Collapse
|
4
|
Coronado E, Yamanobe N, Venture G. NEP+: A Human-Centered Framework for Inclusive Human-Machine Interaction Development. SENSORS (BASEL, SWITZERLAND) 2023; 23:9136. [PMID: 38005524 PMCID: PMC10674609 DOI: 10.3390/s23229136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/07/2023] [Accepted: 11/10/2023] [Indexed: 11/26/2023]
Abstract
This article presents the Network Empower and Prototyping Platform (NEP+), a flexible framework purposefully crafted to simplify the process of interactive application development, catering to both technical and non-technical users. The name "NEP+" encapsulates the platform's dual mission: to empower the network-related capabilities of ZeroMQ and to provide software tools and interfaces for prototyping and integration. NEP+ accomplishes this through a comprehensive quality model and an integrated software ecosystem encompassing middleware, user-friendly graphical interfaces, a command-line tool, and an accessible end-user programming interface. This article primarily focuses on presenting the proposed quality model and software architecture, illustrating how they can empower developers to craft cross-platform, accessible, and user-friendly interfaces for various applications, with a particular emphasis on robotics and the Internet of Things (IoT). Additionally, we provide practical insights into the applicability of NEP+ by briefly presenting real-world user cases where human-centered projects have successfully utilized NEP+ to develop robotics systems. To further emphasize the suitability of NEP+ tools and interfaces for developer use, we conduct a pilot study that delves into usability and workload assessment. The outcomes of this study highlight the user-friendly features of NEP+ tools, along with their ease of adoption and cross-platform capabilities. The novelty of NEP+ fundamentally lies in its holistic approach, acting as a bridge across diverse user groups, fostering inclusivity, and promoting collaboration.
Collapse
Affiliation(s)
- Enrique Coronado
- Industrial Cyber-Physical Systems Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan; (N.Y.); (G.V.)
| | - Natsuki Yamanobe
- Industrial Cyber-Physical Systems Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan; (N.Y.); (G.V.)
| | - Gentiane Venture
- Industrial Cyber-Physical Systems Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan; (N.Y.); (G.V.)
- Graduate School of Engineering, University of Tokyo, Tokyo 113-8656, Japan
| |
Collapse
|
5
|
Abstract
Whole-genome sequencing (WGS) is a powerful method for detecting drug resistance, genetic diversity, and transmission dynamics of Mycobacterium tuberculosis. Implementation of WGS in public health microbiology laboratories is impeded by a lack of user-friendly, automated, and semiautomated pipelines. We present the COMBAT-TB Workbench, a modular, easy-to-install application that provides a web-based environment for Mycobacterium tuberculosis bioinformatics. The COMBAT-TB Workbench is built using two main software components: the IRIDA platform for its web-based user interface and data management capabilities and the Galaxy bioinformatics workflow platform for workflow execution. These components are combined into a single easy-to-install application using Docker container technology. We implemented two workflows, for M. tuberculosis sample analysis and phylogeny, in Galaxy. Building our workflows involved updating some Galaxy tools (Trimmomatic, snippy, and snp-sites) and writing new Galaxy tools (snp-dists, TB-Profiler, tb_variant_filter, and TB Variant Report). The irida-wf-ga2xml tool was updated to be able to work with recent versions of Galaxy and was further developed into IRIDA plugins for both workflows. In the case of the M. tuberculosis sample analysis, an interface was added to update the metadata stored for each sequence sample with results gleaned from the Galaxy workflow output. Data can be loaded into the COMBAT-TB Workbench via the web interface or via the command line IRIDA uploader tool. The COMBAT-TB Workbench application deploys IRIDA, the COMBAT-TB IRIDA plugins, the MariaDB database, and Galaxy using Docker containers (https://github.com/COMBAT-TB/irida-galaxy-deploy). IMPORTANCE While the reduction in the cost of WGS is making sequencing more affordable in lower- and middle-income countries (LMICs), public health laboratories in these countries seldom have access to bioinformaticians and system support engineers adept at using the Linux command line and complex bioinformatics software. The COMBAT-TB Workbench provides an open-source, modular, easy-to-deploy and -use environment for managing and analyzing M. tuberculosis WGS data and thereby makes WGS usable in practice in the LMIC context.
Collapse
|
6
|
Joppich M, Olenchuk M, Mayer JM, Emslander Q, Jimenez-Soto LF, Zimmer R. SEQU-INTO: Early detection of impurities, contamination and off-targets (ICOs) in long read/MinION sequencing. Comput Struct Biotechnol J 2020; 18:1342-1351. [PMID: 32612757 PMCID: PMC7306586 DOI: 10.1016/j.csbj.2020.05.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 05/04/2020] [Accepted: 05/09/2020] [Indexed: 11/15/2022] Open
Abstract
The MinION sequencer by Oxford Nanopore Technologies turns DNA and RNA sequencing into a routine task in biology laboratories or in field research. For downstream analysis it is required to have a sufficient amount of target reads. Especially prokaryotic or bacteriophagic sequencing samples can contain a significant amount of off-target sequences in the processed sample, stemming from human DNA/RNA contamination, insufficient rRNA depletion, or remaining DNA/RNA from other organisms (e.g. host organism from bacteriophage cultivation). Such impurity, contamination and off-targets (ICOs) block read capacity, requiring to sequence deeper. In comparison to second-generation sequencing, MinION sequencing allows to reuse its chip after a (partial) run. This allows further usage of the same chip with more sample, even after adjusting the library preparation to reduce ICOs. The earlier a sample's ICOs are detected, the better the sequencing chip can be conserved for future use. Here we present sequ-into, a low-resource and user-friendly cross-platform tool to detect ICO sequences from a predefined ICO database in samples early during a MinION sequencing run. The data provided by sequ-into empowers the user to quickly take action to preserve sample material and chip capacity. sequ-into is available from https://github.com/mjoppich/sequ-into.
Collapse
Affiliation(s)
- Markus Joppich
- LFE Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, 80333 München, Germany
| | - Margaryta Olenchuk
- LFE Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, 80333 München, Germany
| | - Julia M. Mayer
- LFE Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, 80333 München, Germany
| | - Quirin Emslander
- Physics of Synthetic Biological Systems, Physics Department, Technische Universität München, 85748 Garching, Germany
| | - Luisa F. Jimenez-Soto
- Walther Straub Institute for Pharmacology and Toxicology, Ludwig-Maximilians-Universität München, Goethestrasse 33, 80336 München, Germany
| | - Ralf Zimmer
- LFE Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, 80333 München, Germany
| |
Collapse
|