1
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
2
|
Kwon MS, Lee J, Kim HU. A machine learning framework for extracting information from biological pathway images in the literature. Metab Eng 2024; 86:1-11. [PMID: 39233197 DOI: 10.1016/j.ymben.2024.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 08/03/2024] [Accepted: 09/01/2024] [Indexed: 09/06/2024]
Abstract
There have been significant advances in literature mining, allowing for the extraction of target information from the literature. However, biological literature often includes biological pathway images that are difficult to extract in an easily editable format. To address this challenge, this study aims to develop a machine learning framework called the "Extraction of Biological Pathway Information" (EBPI). The framework automates the search for relevant publications, extracts biological pathway information from images within the literature, including genes, enzymes, and metabolites, and generates the output in a tabular format. For this, this framework determines the direction of biochemical reactions, and detects and classifies texts within biological pathway images. Performance of EBPI was evaluated by comparing the extracted pathway information with manually curated pathway maps. EBPI will be useful for extracting biological pathway information from the literature in a high-throughput manner, and can be used for pathway studies, including metabolic engineering.
Collapse
Affiliation(s)
- Mun Su Kwon
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Junkyu Lee
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Graduate School of Engineering Biology and BioProcess Engineering Research Center, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
3
|
Baldi BF, Vuong J, O’Donoghue SI. Assessing 2D visual encoding of 3D spatial connectivity. FRONTIERS IN BIOINFORMATICS 2024; 3:1232671. [PMID: 38323038 PMCID: PMC10845337 DOI: 10.3389/fbinf.2023.1232671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 12/21/2023] [Indexed: 02/08/2024] Open
Abstract
Introduction: When visualizing complex data, the layout method chosen can greatly affect the ability to identify outliers, spot incorrect modeling assumptions, or recognize unexpected patterns. Additionally, visual layout can play a crucial role in communicating results to peers. Methods: In this paper, we compared the effectiveness of three visual layouts-the adjacency matrix, a half-matrix layout, and a circular layout-for visualizing spatial connectivity data, e.g., contacts derived from chromatin conformation capture experiments. To assess these visual layouts, we conducted a study comprising 150 participants from Amazon's Mechanical Turk, as well as a second expert study comprising 30 biomedical research scientists. Results: The Mechanical Turk study found that the circular layout was the most accurate and intuitive, while the expert study found that the circular and half-matrix layouts were more accurate than the matrix layout. Discussion: We concluded that the circular layout may be a good default choice for visualizing smaller datasets with relatively few spatial contacts, while, for larger datasets, the half- matrix layout may be a better choice. Our results also demonstrated how crowdsourcing methods could be used to determine which visual layouts are best for addressing specific data challenges in bioinformatics.
Collapse
Affiliation(s)
| | - Jenny Vuong
- The Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- CSIRO Data61, Eveleigh, NSW, Australia
| | - Seán I. O’Donoghue
- The Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- CSIRO Data61, Eveleigh, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kennsington, NSW, Australia
| |
Collapse
|
4
|
Cheng CH, Tsai MC, Chang YS. The relationship between hotel star rating and website information quality based on visual presentation. PLoS One 2023; 18:e0290629. [PMID: 37917635 PMCID: PMC10621818 DOI: 10.1371/journal.pone.0290629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 08/13/2023] [Indexed: 11/04/2023] Open
Abstract
The hotel industry is essential for tourism. With the rapid expansion of the internet, consumers only search for their desired keywords on the website when they trying to find a hotel to stay, causing the relevant hotel information would appear. To quickly respond to the changing market and consumer habits, each hotel must focus on its website information and information quality. This study proposes a novel methodology that uses rough set theory (RST), principal component analysis, t-Distributed Stochastic Neighbor Embedding (t-SNE), and attribute performance visualization to explore the relationship between hotel star ratings and hotel website information quality. The collected data are based on the star-rated hotels of the Taiwanstay website, and the checklists of hotel website services are used to obtain the relevant attributes data. The results show that there are significant differences in information quality between hotels below two stars and those above four stars. The information quality provided by the higher star hotels was more detailed than that offered by low-star hotels. Based on the attribute performance matrix, the one-star and two-star hotels have advantage attributes in their landscape, reply time, restaurant information, social media, and compensation. Furthermore, the three-five star hotels have advantage attributes in their operational support, compensation, restaurant information, traffic information, and room information. These results could be provided to the stakeholders as a reference.
Collapse
Affiliation(s)
- Ching-Hsue Cheng
- Department of Information Management, National Yunlin University of Science and Technology, Douliou, Yunlin, Taiwan
| | - Ming-Chi Tsai
- Department of Business Administration, I-Shou University, Kaohsiung City, Taiwan
| | - Yuan-Shao Chang
- Department of Information Management, National Yunlin University of Science and Technology, Douliou, Yunlin, Taiwan
| |
Collapse
|
5
|
Seitz W, Kirwan AD, Brčić-Kostić K, Mitrikeski PT, Seitz PK. Visualizing genomic data: The mixing perspective. Biosystems 2023; 224:104839. [PMID: 36690200 DOI: 10.1016/j.biosystems.2023.104839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/17/2023] [Accepted: 01/18/2023] [Indexed: 01/22/2023]
Abstract
We report on a novel way to visualize genomic data. By considering genome coding sequences, cds, as sets of the N=61 non-stop codons, one obtains a partition of the total number of codons in each cds. Partitions exhibit a statistical property known as mixing character which characterizes how mixed the partition is. Mixing characters have been shown mathematically to exhibit a partial order known as majorization (Ruch, 1975). In previous work (Seitz and Kirwan, 2022) we developed an approach that combined mixing and entropy that is visualized as a scatter plot. If we consider all 1,121,505 partitions of 61 codons, this produces a plot we call the theoretical mixing space, TGMS. A normalization procedure is developed here and applied to real genomic data to produce the genome mixing signature, GMS. Example GMS's of 19 species, including Homo sapiens, are shown and discussed.
Collapse
Affiliation(s)
- William Seitz
- Texas A&M University at Galveston, Galveston, TX 77553, United States of America.
| | - A D Kirwan
- College of Earth, Ocean and Environment, University of Delaware, Newark, DE, 19716, United States of America
| | - Krunoslav Brčić-Kostić
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bos̆ković Institute, Zagreb 10000, Croatia
| | - Petar Tomev Mitrikeski
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bos̆ković Institute, Zagreb 10000, Croatia
| | - P K Seitz
- University of Texas Medical Branch, Galveston, TX 77555, United States of America
| |
Collapse
|
6
|
Durant E, Rouard M, Ganko EW, Muller C, Cleary AM, Farmer AD, Conte M, Sabot F. Ten simple rules for developing visualization tools in genomics. PLoS Comput Biol 2022; 18:e1010622. [PMID: 36355753 PMCID: PMC9648702 DOI: 10.1371/journal.pcbi.1010622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Eloi Durant
- DIADE, University of Montpellier, CIRAD, IRD, Montpellier, France
- Syngenta Seeds SAS, Saint-Sauveur, France
- Bioversity International, Parc Scientifique Agropolis II, Montpellier, France
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
| | - Mathieu Rouard
- Bioversity International, Parc Scientifique Agropolis II, Montpellier, France
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
| | - Eric W. Ganko
- Seeds Research, Syngenta Crop Protection, LLC, Research Triangle Park, Durham, North Carolina, United States of America
| | | | - Alan M. Cleary
- National Center for Genome Resources, Santa Fe, New Mexico, United States of America
| | - Andrew D. Farmer
- National Center for Genome Resources, Santa Fe, New Mexico, United States of America
| | | | - Francois Sabot
- DIADE, University of Montpellier, CIRAD, IRD, Montpellier, France
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
| |
Collapse
|
7
|
Gene Networks and Pathways Involved in Escherichia coli Response to Multiple Stressors. Microorganisms 2022; 10:microorganisms10091793. [PMID: 36144394 PMCID: PMC9501238 DOI: 10.3390/microorganisms10091793] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 08/19/2022] [Accepted: 08/30/2022] [Indexed: 11/16/2022] Open
Abstract
Stress response helps microorganisms survive extreme environmental conditions and host immunity, making them more virulent or drug resistant. Although both reductionist approaches investigating specific genes and systems approaches analyzing individual stress conditions are being used, less is known about gene networks involved in multiple stress responses. Here, using a systems biology approach, we mined hundreds of transcriptomic data sets for key genes and pathways involved in the tolerance of the model microorganism Escherichia coli to multiple stressors. Specifically, we investigated the E. coli K-12 MG1655 transcriptome under five stresses: heat, cold, oxidative stress, nitrosative stress, and antibiotic treatment. Overlaps of transcriptional changes between studies of each stress factor and between different stressors were determined: energy-requiring metabolic pathways, transport, and motility are typically downregulated to conserve energy, while genes related to survival, bona fide stress response, biofilm formation, and DNA repair are mainly upregulated. The transcription of 15 genes with uncharacterized functions is higher in response to multiple stressors, which suggests they may play pivotal roles in stress response. In conclusion, using rank normalization of transcriptomic data, we identified a set of E. coli stress response genes and pathways, which could be potential targets to overcome antibiotic tolerance or multidrug resistance.
Collapse
|
8
|
Auer F, Mayer S, Kramer F. Data-dependent visualization of biological networks in the web-browser with NDExEdit. PLoS Comput Biol 2022; 18:e1010205. [PMID: 35675360 PMCID: PMC9212158 DOI: 10.1371/journal.pcbi.1010205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 06/21/2022] [Accepted: 05/15/2022] [Indexed: 12/02/2022] Open
Abstract
Networks are a common methodology used to capture increasingly complex associations between biological entities. They serve as a resource of biological knowledge for bioinformatics analyses, and also comprise the subsequent results. However, the interpretation of biological networks is challenging and requires suitable visualizations dependent on the contained information. The most prominent software in the field for the visualization of biological networks is Cytoscape, a desktop modeling environment also including many features for analysis. A further challenge when working with networks is their distribution. Within a typical collaborative workflow, even slight changes of the network data force one to repeat the visualization step as well. Also, just minor adjustments to the visual representation not only need the networks to be transferred back and forth. Collaboration on the same resources requires specific infrastructure to avoid redundancies, or worse, the corruption of the data. A well-established solution is provided by the NDEx platform where users can upload a network, share it with selected colleagues or make it publicly available. NDExEdit is a web-based application where simple changes can be made to biological networks within the browser, and which does not require installation. With our tool, plain networks can be enhanced easily for further usage in presentations and publications. Since the network data is only stored locally within the web browser, users can edit their private networks without concerns of unintentional publication. The web tool is designed to conform to the Cytoscape Exchange (CX) format as a data model, which is used for the data transmission by both tools, Cytoscape and NDEx. Therefore the modified network can be directly exported to the NDEx platform or saved as a compatible CX file, additionally to standard image formats like PNG and JPEG. Relations in biological research are often visualized as networks. For instance, if two proteins interact with each other during a certain process, the corresponding network would show two nodes connected by one edge. But the fact that the interaction between the two exists, may not be enough. With established software solutions like Cytoscape we can add all the information we have about our nodes and their interaction to our data foundation. Furthermore, we can change the visual appearance of our nodes and their interaction based on this information. For example, if our network contains 20 nodes, that all interact with each other, but the strength of these interactions each range between 0 and 1, we can illustrate that by making the edges wider for strong interactions and slimmer for weak interactions. Thus, our visualization is enriched with valuable information. As of now these data-dependent modifications can only be made with a desktop client. We introduce NDExEdit, a web-based solution for visualization changes to networks that conform to the CX data format. It allows us to import networks directly from the NDEx platform and apply changes to the visualization—including all types of mappings, one of which was briefly described above.
Collapse
Affiliation(s)
- Florian Auer
- Department of IT-Infrastructure for Translational Medical Research, Faculty of Applied Computer Science, University of Augsburg, Augsburg, Germany
- * E-mail:
| | - Simone Mayer
- Department of IT-Infrastructure for Translational Medical Research, Faculty of Applied Computer Science, University of Augsburg, Augsburg, Germany
| | - Frank Kramer
- Department of IT-Infrastructure for Translational Medical Research, Faculty of Applied Computer Science, University of Augsburg, Augsburg, Germany
| |
Collapse
|
9
|
Koutrouli M, Karatzas E, Papanikolopoulou K, Pavlopoulos GA. NORMA: The Network Makeup Artist - A Web Tool for Network Annotation Visualization. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:578-586. [PMID: 34171457 PMCID: PMC9801029 DOI: 10.1016/j.gpb.2021.02.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 07/08/2020] [Accepted: 11/20/2020] [Indexed: 01/26/2023]
Abstract
The Network Makeup Artist (NORMA) is a web tool for interactive network annotation visualization and topological analysis, able to handle multiple networks and annotations simultaneously. Precalculated annotations (e.g., Gene Ontology, Pathway enrichment, community detection, or clustering results) can be uploaded and visualized in a network, either as colored pie-chart nodes or as color-filled areas in a 2D/3D Venn-diagram-like style. In the case where no annotation exists, algorithms for automated community detection are offered. Users can adjust the network views using standard layout algorithms or allow NORMA to slightly modify them for visually better group separation. Once a network view is set, users can interactively select and highlight any group of interest in order to generate publication-ready figures. Briefly, with NORMA, users can encode three types of information simultaneously. These are 1) the network, 2) the communities or annotations of interest, and 3) node categories or expression values. Finally, NORMA offers basic topological analysis and direct topological comparison across any of the selected networks. NORMA service is available at http://norma.pavlopouloslab.info, whereas the code is available at https://github.com/PavlopoulosLab/NORMA.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari 16672, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari 16672, Greece,Department of Informatics and Telecommunications, University of Athens, Athens 15703, Greece
| | | | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari 16672, Greece,Corresponding author.
| |
Collapse
|
10
|
Shankar A, Sharma KK. Fungal secondary metabolites in food and pharmaceuticals in the era of multi-omics. Appl Microbiol Biotechnol 2022; 106:3465-3488. [PMID: 35546367 PMCID: PMC9095418 DOI: 10.1007/s00253-022-11945-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 04/12/2022] [Accepted: 04/24/2022] [Indexed: 01/16/2023]
Abstract
Fungi produce several bioactive metabolites, pigments, dyes, antioxidants, polysaccharides, and industrial enzymes. Fungal products are also the primary sources of functional food and nutrition, and their pharmacological products are used for healthy aging. Their molecular properties are validated through the use of recent high-throughput genomic, transcriptomic, and metabolomic tools and techniques. Together, these updated multi-omic tools have been used to study fungal metabolites structure and their mode of action on biological and cellular processes. Diverse groups of fungi produce different proteins and secondary metabolites, which possess tremendous biotechnological and pharmaceutical applications. Furthermore, its use and acceptability can be accelerated by adopting multi-omics, bioinformatics, and machine learning tools that generate a huge amount of molecular data. The integration of artificial intelligence and machine learning tools in the era of omics and big data has opened up a new outlook in both basic and applied researches in the area of nutraceuticals and functional food and nutrition. KEY POINTS: • Multi-omic tool helps in the identification of novel fungal metabolites • Intra-omic data from genomics to bioinformatics • Novel metabolites and application in human health.
Collapse
Affiliation(s)
- Akshay Shankar
- Laboratory of Enzymology and Recombinant DNA Technology, Department of Microbiology, Maharshi Dayanand University, Rohtak, 124001, Haryana, India
| | - Krishna Kant Sharma
- Laboratory of Enzymology and Recombinant DNA Technology, Department of Microbiology, Maharshi Dayanand University, Rohtak, 124001, Haryana, India.
| |
Collapse
|
11
|
Darling: A Web Application for Detecting Disease-Related Biomedical Entity Associations with Literature Mining. Biomolecules 2022; 12:biom12040520. [PMID: 35454109 PMCID: PMC9028073 DOI: 10.3390/biom12040520] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/24/2022] [Accepted: 03/28/2022] [Indexed: 12/15/2022] Open
Abstract
Finding, exploring and filtering frequent sentence-based associations between a disease and a biomedical entity, co-mentioned in disease-related PubMed literature, is a challenge, as the volume of publications increases. Darling is a web application, which utilizes Name Entity Recognition to identify human-related biomedical terms in PubMed articles, mentioned in OMIM, DisGeNET and Human Phenotype Ontology (HPO) disease records, and generates an interactive biomedical entity association network. Nodes in this network represent genes, proteins, chemicals, functions, tissues, diseases, environments and phenotypes. Users can search by identifiers, terms/entities or free text and explore the relevant abstracts in an annotated format.
Collapse
|
12
|
Lucas M. Future Challenges in Plant Systems Biology. Methods Mol Biol 2022; 2395:325-337. [PMID: 34822161 DOI: 10.1007/978-1-0716-1816-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Plant systems biology is currently facing several important challenges, whose nature depend on the considered frame of reference and associated scale. This review covers some of the issues associated respectively with the molecular, tissue, and whole-plant scales, as well as discusses the potential for latest advances in synthetic biology and machine-learning methods to be of use in the future of plant systems biology.
Collapse
Affiliation(s)
- Mikaël Lucas
- DIADE, Univ Montpellier, IRD, CIRAD, Montpellier, France.
| |
Collapse
|
13
|
Jovicic SM. Global trend of clinical biomarkers of health and disease during the period (1913–2021): systematic review and bibliometric analysis. AFRICAN JOURNAL OF UROLOGY 2021. [DOI: 10.1186/s12301-021-00239-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
The literature review provides a concise and detailed description of the available and published data on the investigated research problem. The study summarizes findings over the last 100 years regarding clinical biomarkers during health and disease. Research expanded to present the range of enzyme acetylcholinesterase in human blood utilizing diverse methodology during the 1949–2021 year.
Main body
Data analysis includes program SPSS v23.0, frequency, percentage, numbers and graphical presentation of results. Information from the papers gathers in Microsoft Excel 2007 and contains information: study type, journal, publisher, year of publication, continent, the health status of respondents, biomarkers, number and age of participants, types of samples, methodology, goals and conclusions. Data collection includes electronic databases, the National Center for Biotechnology Information and Google Scholar, with several inclusion criteria: (1) anthropometry (2) urine (3) blood in the healthy and diseased population parameters during different physiological states of the organism. The initial number of collected and analyzed papers is 1900. The final analysis included 982 studies out of 1454 selected papers. After the selection process, 67.53% remains useful. The range of enzyme acetylcholinesterase included 107 publications.
Conclusion
The number of published scientific papers has been increasing over the years. Little practical information in scientific and clinical practice exists. There is an urgent need for concise highlighting of literature key arguments and ideas. Results apply to a specialized area of research.
Collapse
|
14
|
Baltoumas FA, Zafeiropoulou S, Karatzas E, Koutrouli M, Thanati F, Voutsadaki K, Gkonta M, Hotova J, Kasionis I, Hatzis P, Pavlopoulos GA. Biomolecule and Bioentity Interaction Databases in Systems Biology: A Comprehensive Review. Biomolecules 2021; 11:1245. [PMID: 34439912 PMCID: PMC8391349 DOI: 10.3390/biom11081245] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 08/16/2021] [Accepted: 08/18/2021] [Indexed: 02/06/2023] Open
Abstract
Technological advances in high-throughput techniques have resulted in tremendous growth of complex biological datasets providing evidence regarding various biomolecular interactions. To cope with this data flood, computational approaches, web services, and databases have been implemented to deal with issues such as data integration, visualization, exploration, organization, scalability, and complexity. Nevertheless, as the number of such sets increases, it is becoming more and more difficult for an end user to know what the scope and focus of each repository is and how redundant the information between them is. Several repositories have a more general scope, while others focus on specialized aspects, such as specific organisms or biological systems. Unfortunately, many of these databases are self-contained or poorly documented and maintained. For a clearer view, in this article we provide a comprehensive categorization, comparison and evaluation of such repositories for different bioentity interaction types. We discuss most of the publicly available services based on their content, sources of information, data representation methods, user-friendliness, scope and interconnectivity, and we comment on their strengths and weaknesses. We aim for this review to reach a broad readership varying from biomedical beginners to experts and serve as a reference article in the field of Network Biology.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Sofia Zafeiropoulou
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Foteini Thanati
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Kleanthi Voutsadaki
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Maria Gkonta
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Joana Hotova
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Ioannis Kasionis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Pantelis Hatzis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| |
Collapse
|
15
|
Singh AK, Bilal M, Iqbal HMN, Raj A. Trends in predictive biodegradation for sustainable mitigation of environmental pollutants: Recent progress and future outlook. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 770:144561. [PMID: 33736422 DOI: 10.1016/j.scitotenv.2020.144561] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 12/13/2020] [Accepted: 12/13/2020] [Indexed: 02/05/2023]
Abstract
The feasibility of in-silico techniques, together with the computational framework, has been applied to predictive bioremediation aiming to clean-up contaminants, toxicity evaluation, and possibilities for the degradation of complex recalcitrant compounds. Emerging contaminants from different industries have posed a significant hazard to the environment and public health. Given current bioremediation strategies, it is often a failure or inadequate for sustainable mitigation of hazardous pollutants. However, clear-cut vital information about biodegradation is quite incomplete from a conventional remediation techniques perspective. Lacking complete information on bio-transformed compounds leads to seeking alternative methods. Only scarce information about the transformed products and toxicity profile is available in the published literature. To fulfill this literature gap, various computational or in-silico technologies have emerged as alternating techniques, which are being recognized as in-silico approaches for bioremediation. Molecular docking, molecular dynamics simulation, and biodegradation pathways predictions are the vital part of predictive biodegradation, including the Quantitative Structure-Activity Relationship (QSAR), Quantitative structure-biodegradation relationship (QSBR) model system. Furthermore, machine learning (ML), artificial neural network (ANN), genetic algorithm (GA) based programs offer simultaneous biodegradation prediction along with toxicity and environmental fate prediction. Herein, we spotlight the feasibility of in-silico remediation approaches for various persistent, recalcitrant contaminants while traditional bioremediation fails to mitigate such pollutants. Such could be addressed by exploiting described model systems and algorithm-based programs. Furthermore, recent advances in QSAR modeling, algorithm, and dedicated biodegradation prediction system have been summarized with unique attributes.
Collapse
Affiliation(s)
- Anil Kumar Singh
- Environmental Microbiology Laboratory, Environmental Toxicology Group, CSIR-Indian Institute of Toxicology Research (CSIR-IITR), Vishvigyan Bhawan, 31, Mahatma Gandhi Marg, Lucknow 226001, Uttar Pradesh, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Muhammad Bilal
- School of Life Science and Food Engineering, Huaiyin Institute of Technology, Huaian 223003, China
| | - Hafiz M N Iqbal
- Tecnologico de Monterrey, School of Engineering and Sciences, Monterrey 64849, Mexico.
| | - Abhay Raj
- Environmental Microbiology Laboratory, Environmental Toxicology Group, CSIR-Indian Institute of Toxicology Research (CSIR-IITR), Vishvigyan Bhawan, 31, Mahatma Gandhi Marg, Lucknow 226001, Uttar Pradesh, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India.
| |
Collapse
|
16
|
Koutrouli M, Hatzis P, Pavlopoulos GA. Exploring Networks in the STRING and Reactome Database. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11516-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
17
|
Khomtchouk BB, Tran DT, Vand KA, Might M, Gozani O, Assimes TL. Cardioinformatics: the nexus of bioinformatics and precision cardiology. Brief Bioinform 2020; 21:2031-2051. [PMID: 31802103 PMCID: PMC7947182 DOI: 10.1093/bib/bbz119] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 08/08/2019] [Accepted: 08/13/2019] [Indexed: 12/12/2022] Open
Abstract
Cardiovascular disease (CVD) is the leading cause of death worldwide, causing over 17 million deaths per year, which outpaces global cancer mortality rates. Despite these sobering statistics, most bioinformatics and computational biology research and funding to date has been concentrated predominantly on cancer research, with a relatively modest footprint in CVD. In this paper, we review the existing literary landscape and critically assess the unmet need to further develop an emerging field at the multidisciplinary interface of bioinformatics and precision cardiovascular medicine, which we refer to as 'cardioinformatics'.
Collapse
Affiliation(s)
- Bohdan B Khomtchouk
- Department of Biology, Stanford University, Stanford, CA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Medicine, Section of Computational Biomedicine and Biomedical Data Science, University of Chicago, Chicago, IL, USA
| | - Diem-Trang Tran
- School of Computing, University of Utah, Salt Lake City, UT, USA
| | | | - Matthew Might
- Hugh Kaul Personalized Medicine Institute, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Or Gozani
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Themistocles L Assimes
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
- VA Palo Alto Health Care System, Palo Alto, CA, USA
| |
Collapse
|
18
|
Piereck B, Oliveira-Lima M, Benko-Iseppon AM, Diehl S, Schneider R, Brasileiro-Vidal AC, Barbosa-Silva A. LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks. BMC Bioinformatics 2020; 21:365. [PMID: 32838742 PMCID: PMC7447576 DOI: 10.1186/s12859-020-03620-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 06/19/2020] [Indexed: 11/11/2022] Open
Abstract
Background The amount of published full-text articles has increased dramatically. Text mining tools configure an essential approach to building biological networks, updating databases and providing annotation for new pathways. PESCADOR is an online web server based on LAITOR and NLProt text mining tools, which retrieves protein-protein co-occurrences in a tabular-based format, adding a network schema. Here we present an HPC-oriented version of PESCADOR’s native text mining tool, renamed to LAITOR4HPC, aiming to access an unlimited abstract amount in a short time to enrich available networks, build new ones and possibly highlight whether fields of research have been exhaustively studied. Results By taking advantage of parallel computing HPC infrastructure, the full collection of MEDLINE abstracts available until June 2017 was analyzed in a shorter period (6 days) when compared to the original online implementation (with an estimated 2 years to run the same data). Additionally, three case studies were presented to illustrate LAITOR4HPC usage possibilities. The first case study targeted soybean and was used to retrieve an overview of published co-occurrences in a single organism, retrieving 15,788 proteins in 7894 co-occurrences. In the second case study, a target gene family was searched in many organisms, by analyzing 15 species under biotic stress. Most co-occurrences regarded Arabidopsis thaliana and Zea mays. The third case study concerned the construction and enrichment of an available pathway. Choosing A. thaliana for further analysis, the defensin pathway was enriched, showing additional signaling and regulation molecules, and how they respond to each other in the modulation of this complex plant defense response. Conclusions LAITOR4HPC can be used for an efficient text mining based construction of biological networks derived from big data sources, such as MEDLINE abstracts. Time consumption and data input limitations will depend on the available resources at the HPC facility. LAITOR4HPC enables enough flexibility for different approaches and data amounts targeted to an organism, a subject, or a specific pathway. Additionally, it can deliver comprehensive results where interactions are classified into four types, according to their reliability.
Collapse
Affiliation(s)
- Bruna Piereck
- Genetics Department, Laboratório de Genética e Biologia Vegetal, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil
| | - Marx Oliveira-Lima
- Genetics Department, Laboratório de Genética e Biologia Vegetal, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil
| | - Ana Maria Benko-Iseppon
- Genetics Department, Laboratório de Genética e Biologia Vegetal, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil.
| | - Sarah Diehl
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg
| | - Ana Christina Brasileiro-Vidal
- Genetics Department, Laboratório de Genética e Biologia Vegetal, Universidade Federal de Pernambuco, Recife, Pernambuco, Brazil
| | - Adriano Barbosa-Silva
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg. .,Queen Mary University of London, Centre for Translational Bioinformatics, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Charterhouse Square, London, UK.
| |
Collapse
|
19
|
Rutter L, Cook D. bigPint: A Bioconductor visualization package that makes big data pint-sized. PLoS Comput Biol 2020; 16:e1007912. [PMID: 32542031 PMCID: PMC7347224 DOI: 10.1371/journal.pcbi.1007912] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 07/09/2020] [Accepted: 04/27/2020] [Indexed: 11/20/2022] Open
Abstract
Interactive data visualization is imperative in the biological sciences. The development of independent layers of interactivity has been in pursuit in the visualization community. We developed bigPint, a data visualization package available on Bioconductor under the GPL-3 license (https://bioconductor.org/packages/release/bioc/html/bigPint.html). Our software introduces new visualization technology that enables independent layers of interactivity using Plotly in R, which aids in the exploration of large biological datasets. The bigPint package presents modernized versions of scatterplot matrices, volcano plots, and litre plots through the implementation of layered interactivity. These graphics have detected normalization issues, differential expression designation problems, and common analysis errors in public RNA-sequencing datasets. Researchers can apply bigPint graphics to their data by following recommended pipelines written in reproducible code in the user manual. In this paper, we explain how we achieved the independent layers of interactivity that are behind bigPint graphics. Pseudocode and source code are provided. Computational scientists can leverage our open-source code to expand upon our layered interactive technology and/or apply it in new ways toward other computational biology tasks. Biological disciplines face the challenge of increasingly large and complex data. One necessary approach toward eliciting information is data visualization. Newer visualization tools incorporate interactive capabilities that allow scientists to extract information more efficiently than static counterparts. In this paper, we introduce technology that allows multiple independent layers of interactive visualization written in open-source code. This technology can be repurposed across various biological problems. Here, we apply this technology to RNA-sequencing data, a popular next-generation sequencing approach that provides snapshots of RNA quantity in biological samples at given moments in time. It can be used to investigate cellular differences between health and disease, cellular changes in response to external stimuli, and additional biological inquiries. RNA-sequencing data is large, noisy, and biased. It requires sophisticated normalization. The most popular open-source RNA-sequencing data analysis software focuses on models, with little emphasis on integrating effective visualization tools. This is despite sound evidence that RNA-sequencing data is most effectively explored using graphical and numerical approaches in a complementary fashion. The software we introduce can make it easier for researchers to use models and visuals in an integrated fashion during RNA-sequencing data analysis.
Collapse
Affiliation(s)
- Lindsay Rutter
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, United States of America
- * E-mail:
| | - Dianne Cook
- Econometrics and Business Statistics, Monash University, Clayton, VIC, Australia
| |
Collapse
|
20
|
Koutrouli M, Karatzas E, Paez-Espino D, Pavlopoulos GA. A Guide to Conquer the Biological Network Era Using Graph Theory. Front Bioeng Biotechnol 2020; 8:34. [PMID: 32083072 PMCID: PMC7004966 DOI: 10.3389/fbioe.2020.00034] [Citation(s) in RCA: 92] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/15/2020] [Indexed: 12/24/2022] Open
Abstract
Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this article, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs. In addition, we describe several network properties and we highlight some of the widely used network topological features. We briefly mention the network patterns, motifs and models, and we further comment on the types of biological and biomedical networks along with their corresponding computer- and human-readable file formats. Finally, we discuss a variety of algorithms and metrics for network analyses regarding graph drawing, clustering, visualization, link prediction, perturbation, and network alignment as well as the current state-of-the-art tools. We expect this review to reach a very broad spectrum of readers varying from experts to beginners while encouraging them to enhance the field further.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Department of Informatics and Telecommunications, University of Athens, Athens, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, Department of Energy, Joint Genome Institute, Walnut Creek, CA, United States
| | | |
Collapse
|
21
|
Abstract
Network-based approach is rapidly emerging as a promising strategy to integrate and interpret different -omics datasets, including metabolomics. The first section of this chapter introduces the current progresses and main concepts in multi-omics integration. The second section provides an overview of the public resources available for creation of biological networks. The third section describes three common application scenarios including subnetwork identification, network-based enrichment analysis, and systems metabolomics. The section four introduces the concept of hierarchical community network analysis. The section five discusses different tools for network visualization. The chapter ends with a future perspective on multi-omics integration.
Collapse
Affiliation(s)
- Guangyan Zhou
- Institute of Parasitology, McGill University, Montreal, QC, Canada
| | - Shuzhao Li
- Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Montreal, QC, Canada. .,Department of Animal Science, McGill University, Montreal, QC, Canada. .,Department of Microbiology and Immunology, McGill University, Montreal, QC, Canada. .,Department of Human Genetics, McGill University, Montreal, QC, Canada.
| |
Collapse
|
22
|
Costessi A, van den Bogert B, May A, Ver Loren van Themaat E, Roubos JA, Kolkman MAB, Butler D, Pirovano W. Novel sequencing technologies to support industrial biotechnology. FEMS Microbiol Lett 2019; 365:4982775. [PMID: 30010862 DOI: 10.1093/femsle/fny103] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/19/2018] [Indexed: 12/11/2022] Open
Abstract
Industrial biotechnology develops and applies microorganisms for the production of bioproducts and enzymes with applications ranging from food and feed ingredients and processing to bio-based chemicals, biofuels and pharmaceutical products. Next generation DNA sequencing technologies play an increasingly important role in improving and accelerating microbial strain development for existing and novel bio-products via screening, gene and pathway discovery, metabolic engineering and additional optimization and understanding of large-scale manufacturing. In this mini-review, we describe novel DNA sequencing and analysis technologies with a focus on applications to industrial strain development, enzyme discovery and microbial community analysis.
Collapse
Affiliation(s)
- Adalberto Costessi
- Next Generation Sequencing Department, BaseClear B.V., Sylviusweg 74, 2333 BE, Leiden, The Netherlands
| | | | - Ali May
- Bioinformatics Department, BaseClear B.V., Sylviusweg 74, 2333 BE, Leiden, The Netherlands
| | | | - Johannes A Roubos
- DSM Biotechnology Center, DSM, Alexander Fleminglaan 1, 2600 MA, Delft, The Netherlands
| | - Marc A B Kolkman
- Division of Industrial Biosciences, DuPont, Archimedesweg 30, 2300 AE, Leiden, The Netherlands
| | - Derek Butler
- Bianomics Business Unit, BaseClear B.V., Sylviusweg 74, 2333 BE, Leiden, The Netherlands
| | - Walter Pirovano
- Bioinformatics Department, BaseClear B.V., Sylviusweg 74, 2333 BE, Leiden, The Netherlands
| |
Collapse
|
23
|
Zhou G, Xia J. OmicsNet: a web-based tool for creation and visual analysis of biological networks in 3D space. Nucleic Acids Res 2019; 46:W514-W522. [PMID: 29878180 PMCID: PMC6030925 DOI: 10.1093/nar/gky510] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/23/2018] [Indexed: 12/16/2022] Open
Abstract
Biological networks play increasingly important roles in omics data integration and systems biology. Over the past decade, many excellent tools have been developed to support creation, analysis and visualization of biological networks. However, important limitations remain: most tools are standalone programs, the majority of them focus on protein-protein interaction (PPI) or metabolic networks, and visualizations often suffer from 'hairball' effects when networks become large. To help address these limitations, we developed OmicsNet - a novel web-based tool that allows users to easily create different types of molecular interaction networks and visually explore them in a three-dimensional (3D) space. Users can upload one or multiple lists of molecules of interest (genes/proteins, microRNAs, transcription factors or metabolites) to create and merge different types of biological networks. The 3D network visualization system was implemented using the powerful Web Graphics Library (WebGL) technology that works natively in most major browsers. OmicsNet supports force-directed layout, multi-layered perspective layout, as well as spherical layout to help visualize and navigate complex networks. A rich set of functions have been implemented to allow users to perform coloring, shading, topology analysis, and enrichment analysis. OmicsNet is freely available at http://www.omicsnet.ca.
Collapse
Affiliation(s)
- Guangyan Zhou
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada.,Department of Animal Science, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
24
|
Azad A, Pavlopoulos GA, Ouzounis CA, Kyrpides NC, Buluç A. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res 2019; 46:e33. [PMID: 29315405 PMCID: PMC5888241 DOI: 10.1093/nar/gkx1313] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 01/02/2018] [Indexed: 11/13/2022] Open
Abstract
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.
Collapse
Affiliation(s)
- Ariful Azad
- Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
| | - Georgios A Pavlopoulos
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica 57001, Greece
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Aydin Buluç
- Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA.,Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
25
|
Cruz A, Arrais JP, Machado P. Interactive and coordinated visualization approaches for biological data analysis. Brief Bioinform 2019; 20:1513-1523. [PMID: 29590305 DOI: 10.1093/bib/bby019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 01/24/2018] [Indexed: 12/11/2022] Open
Abstract
The field of computational biology has become largely dependent on data visualization tools to analyze the increasing quantities of data gathered through the use of new and growing technologies. Aside from the volume, which often results in large amounts of noise and complex relationships with no clear structure, the visualization of biological data sets is hindered by their heterogeneity, as data are obtained from different sources and contain a wide variety of attributes, including spatial and temporal information. This requires visualization approaches that are able to not only represent various data structures simultaneously but also provide exploratory methods that allow the identification of meaningful relationships that would not be perceptible through data analysis algorithms alone. In this article, we present a survey of visualization approaches applied to the analysis of biological data. We focus on graph-based visualizations and tools that use coordinated multiple views to represent high-dimensional multivariate data, in particular time series gene expression, protein-protein interaction networks and biological pathways. We then discuss how these methods can be used to help solve the current challenges surrounding the visualization of complex biological data sets.
Collapse
Affiliation(s)
- António Cruz
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| | - Joel P Arrais
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| | - Penousal Machado
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| |
Collapse
|
26
|
Nusrat S, Harbig T, Gehlenborg N. Tasks, Techniques, and Tools for Genomic Data Visualization. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2019; 38:781-805. [PMID: 31768085 PMCID: PMC6876635 DOI: 10.1111/cgf.13727] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Genomic data visualization is essential for interpretation and hypothesis generation as well as a valuable aid in communicating discoveries. Visual tools bridge the gap between algorithmic approaches and the cognitive skills of investigators. Addressing this need has become crucial in genomics, as biomedical research is increasingly data-driven and many studies lack well-defined hypotheses. A key challenge in data-driven research is to discover unexpected patterns and to formulate hypotheses in an unbiased manner in vast amounts of genomic and other associated data. Over the past two decades, this has driven the development of numerous data visualization techniques and tools for visualizing genomic data. Based on a comprehensive literature survey, we propose taxonomies for data, visualization, and tasks involved in genomic data visualization. Furthermore, we provide a comprehensive review of published genomic visualization tools in the context of the proposed taxonomies.
Collapse
Affiliation(s)
- S Nusrat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - T Harbig
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - N Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
27
|
Qu Z, Lau CW, Nguyen QV, Zhou Y, Catchpoole DR. Visual Analytics of Genomic and Cancer Data: A Systematic Review. Cancer Inform 2019; 18:1176935119835546. [PMID: 30890859 PMCID: PMC6416684 DOI: 10.1177/1176935119835546] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 01/29/2019] [Indexed: 12/12/2022] Open
Abstract
Visual analytics and visualisation can leverage the human perceptual system to
interpret and uncover hidden patterns in big data. The advent of next-generation
sequencing technologies has allowed the rapid production of massive amounts of
genomic data and created a corresponding need for new tools and methods for
visualising and interpreting these data. Visualising genomic data requires not
only simply plotting of data but should also offer a decision or a choice about
what the message should be conveyed in the particular plot; which methodologies
should be used to represent the results must provide an easy, clear, and
accurate way to the clinicians, experts, or researchers to interact with the
data. Genomic data visual analytics is rapidly evolving in parallel with
advances in high-throughput technologies such as artificial intelligence (AI)
and virtual reality (VR). Personalised medicine requires new genomic
visualisation tools, which can efficiently extract knowledge from the genomic
data and speed up expert decisions about the best treatment of individual
patient’s needs. However, meaningful visual analytics of such large genomic data
remains a serious challenge. This article provides a comprehensive systematic
review and discussion on the tools, methods, and trends for visual analytics of
cancer-related genomic data. We reviewed methods for genomic data visualisation
including traditional approaches such as scatter plots, heatmaps, coordinates,
and networks, as well as emerging technologies using AI and VR. We also
demonstrate the development of genomic data visualisation tools over time and
analyse the evolution of visualising genomic data.
Collapse
Affiliation(s)
- Zhonglin Qu
- School of Computing, Engineering and Mathematics, Western Sydney University, Penrith, NSW, Australia
| | - Chng Wei Lau
- School of Computing, Engineering and Mathematics, Western Sydney University, Penrith, NSW, Australia
| | - Quang Vinh Nguyen
- School of Computing, Engineering and Mathematics, Western Sydney University, Penrith, NSW, Australia.,The MARCS Institute, Western Sydney University, Penrith, NSW, Australia
| | - Yi Zhou
- School of Computing, Engineering and Mathematics, Western Sydney University, Penrith, NSW, Australia
| | - Daniel R Catchpoole
- The Tumour Bank, Children's Cancer Research Unit, Kids Research, The Children's Hospital at Westmead, Westmead, NSW, Australia.,Discipline of Paediatrics and Child Health, Faculty of Medicine, The University of Sydney, Sydney, NSW, Australia.,Faculty of Information Technology, The University of Technology Sydney, Ultimo, NSW, Australia
| |
Collapse
|
28
|
Kaur S, Baldi B, Vuong J, O'Donoghue SI. Visualization and Analysis of Epiproteome Dynamics. J Mol Biol 2019; 431:1519-1539. [PMID: 30769119 DOI: 10.1016/j.jmb.2019.01.044] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 01/29/2019] [Accepted: 01/29/2019] [Indexed: 12/28/2022]
Abstract
The epiproteome describes the set of all post-translational modifications (PTMs) made to the proteins comprising a cell or organism. The extent of the epiproteome is still largely unknown; however, advances in experimental techniques are beginning to produce a deluge of data, tracking dynamic changes to the epiproteome in response to cellular stimuli. These data have potential to revolutionize our understanding of biology and disease. This review covers a range of recent visualization methods and tools developed specifically for dynamic epiproteome data sets. These methods have been designed primarily for data sets on phosphorylation, as this the most studied PTM; however, most of these methods are also applicable to other types of PTMs. Unfortunately, the currently available methods are often inadequate for existing data sets; thus, realizing the potential buried in epiproteome data sets will require new, tailored bioinformatics methods that will help researchers analyze, visualize, and interactively explore these complex data sets.
Collapse
Affiliation(s)
- Sandeep Kaur
- University of New South Wales (UNSW), Kensington, NSW 2052, Australia; Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia.
| | - Benedetta Baldi
- Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia; Data 61, CSIRO, Eveleigh, NSW 2015, Australia.
| | - Jenny Vuong
- Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia; Data 61, CSIRO, Eveleigh, NSW 2015, Australia.
| | - Seán I O'Donoghue
- University of New South Wales (UNSW), Kensington, NSW 2052, Australia; Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia; Data 61, CSIRO, Eveleigh, NSW 2015, Australia.
| |
Collapse
|
29
|
Zhou G, Xia J. Using OmicsNet for Network Integration and 3D Visualization. ACTA ACUST UNITED AC 2018; 65:e69. [DOI: 10.1002/cpbi.69] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guangyan Zhou
- Institute of Parasitology, McGill University, Sainte Anne de Bellevue; Quebec Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Sainte Anne de Bellevue; Quebec Canada
- Department of Animal Sciences, McGill University, Sainte Anne de Bellevue; Quebec Canada
- Department of Microbiology and Immunology, McGill University; Montreal Quebec Canada
| |
Collapse
|
30
|
Garanina IA, Fisunov GY, Govorun VM. BAC-BROWSER: The Tool for Visualization and Analysis of Prokaryotic Genomes. Front Microbiol 2018; 9:2827. [PMID: 30519231 PMCID: PMC6258810 DOI: 10.3389/fmicb.2018.02827] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 11/05/2018] [Indexed: 11/13/2022] Open
Abstract
Prokaryotes are actively studied objects in the scope of genomic regulation. Microbiologists need special tools for complex analysis of data to study and identification of regulatory mechanism in bacteria and archaea. We developed a tool BAC-BROWSER, specifically for visualization and analysis of small prokaryotic genomes. BAC-BROWSER provides tools for different types of analysis to study a wide set of regulatory mechanisms of prokaryotes: -transcriptional regulation by transcription factors (TFs), analysis of TFs, their targets, and binding sites.-other regulatory motifs, promoters, terminators and ribosome binding sites-transcriptional regulation by variation of operon structure, alternative starts or ends of transcription.-non-coding RNAs, antisense RNAs-RNA secondary structure, riboswitches-GC content, GC skew, codon usage BAC-browser incorporated free programs accelerating the verification of obtained results: primer design and oligocalculator, vector visualization, the tool for synthetic gene construction. The program is designed for Windows operating system and freely available for download in http://smdb.rcpcm.org/tools/index.html.
Collapse
Affiliation(s)
- Irina A Garanina
- Federal Research and Clinical Centre of Physical-Chemical Medicine, Moscow, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| | - Gleb Y Fisunov
- Federal Research and Clinical Centre of Physical-Chemical Medicine, Moscow, Russia
| | - Vadim M Govorun
- Federal Research and Clinical Centre of Physical-Chemical Medicine, Moscow, Russia.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| |
Collapse
|
31
|
Hammoud Z, Kramer F. mully: An R Package to Create, Modify and Visualize Multilayered Graphs. Genes (Basel) 2018; 9:E519. [PMID: 30360563 PMCID: PMC6267209 DOI: 10.3390/genes9110519] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 10/18/2018] [Accepted: 10/18/2018] [Indexed: 12/20/2022] Open
Abstract
The modelling of complex biological networks such as pathways has been a necessity for scientists over the last decades. The study of these networks also imposes a need to investigate different aspects of nodes or edges within the networks, or other biomedical knowledge related to it. Our aim is to provide a generic modelling framework to integrate multiple pathway types and further knowledge sources influencing these networks. This framework is defined by a multi-layered model allowing automatic network transformations and documentation. By providing a tool that generates this model, we aim to facilitate the data integration, boost the reproducibility and increase the interoperability between different sources and databases in the field of pathways. We present mully R package that allows the user to create, modify and visualize graphs with multi-layers. The package is implemented with features to specifically handle multilayered graphs.
Collapse
Affiliation(s)
- Zaynab Hammoud
- Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073 Göttingen, Germany.
- Institute of Computer Science, IT Infrastructure for Translational Medical Research, University of Augsburg, Universitätsstraße 6a, 86159, Augsburg, Germany.
| | - Frank Kramer
- Institute of Computer Science, IT Infrastructure for Translational Medical Research, University of Augsburg, Universitätsstraße 6a, 86159, Augsburg, Germany.
| |
Collapse
|
32
|
Dangi AK, Sharma B, Hill RT, Shukla P. Bioremediation through microbes: systems biology and metabolic engineering approach. Crit Rev Biotechnol 2018; 39:79-98. [DOI: 10.1080/07388551.2018.1500997] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Arun Kumar Dangi
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| | - Babita Sharma
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| | - Russell T. Hill
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, MD, USA
| | - Pratyoosh Shukla
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| |
Collapse
|
33
|
Lavarenne J, Guyomarc'h S, Sallaud C, Gantet P, Lucas M. The Spring of Systems Biology-Driven Breeding. TRENDS IN PLANT SCIENCE 2018; 23:706-720. [PMID: 29764727 DOI: 10.1016/j.tplants.2018.04.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 04/12/2018] [Accepted: 04/16/2018] [Indexed: 05/08/2023]
Abstract
Genetics and molecular biology have contributed to the development of rationalized plant breeding programs. Recent developments in both high-throughput experimental analyses of biological systems and in silico data processing offer the possibility to address the whole gene regulatory network (GRN) controlling a given trait. GRN models can be applied to identify topological features helping to shortlist potential candidate genes for breeding purposes. Time-series data sets can be used to support dynamic modelling of the network. This will enable a deeper comprehension of network behaviour and the identification of the few elements to be genetically rewired to push the system towards a modified phenotype of interest. This paves the way to design more efficient, systems biology-based breeding strategies.
Collapse
Affiliation(s)
- Jérémy Lavarenne
- UMR DIADE, Université de Montpellier, IRD, 911 Avenue Agropolis, 34394 Montpellier cedex 5, France; Biogemma, Centre de Recherches de Chappes, Route d'Ennezat, 63720 Chappes, France
| | - Soazig Guyomarc'h
- UMR DIADE, Université de Montpellier, IRD, 911 Avenue Agropolis, 34394 Montpellier cedex 5, France
| | - Christophe Sallaud
- Biogemma, Centre de Recherches de Chappes, Route d'Ennezat, 63720 Chappes, France
| | - Pascal Gantet
- UMR DIADE, Université de Montpellier, IRD, 911 Avenue Agropolis, 34394 Montpellier cedex 5, France.
| | - Mikaël Lucas
- UMR DIADE, Université de Montpellier, IRD, 911 Avenue Agropolis, 34394 Montpellier cedex 5, France
| |
Collapse
|
34
|
Kalayci S, Gümüş ZH. Exploring Biological Networks in 3D, Stereoscopic 3D, and Immersive 3D with iCAVE. ACTA ACUST UNITED AC 2018; 61:8.27.1-8.27.26. [PMID: 30040198 DOI: 10.1002/cpbi.47] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Biological networks are becoming increasingly large and complex, pushing the limits of existing 2D tools. iCAVE is an open-source software tool for interactive visual explorations of large and complex networks in 3D, stereoscopic 3D, or immersive 3D. It introduces new 3D network layout algorithms and 3D extensions of popular 2D network layout, clustering, and edge bundling algorithms to assist researchers in understanding the underlying patterns in large, multi-layered, clustered, or complex networks. This protocol aims to guide new users on the basic functions of iCAVE for loading data, laying out networks (single or multi-layered), bundling edges, clustering networks, visualizing clusters, visualizing data attributes, and saving output images or videos. It also provides examples on visualizing networks constrained in physical 3D space (e.g., proteins; neurons; brain). It is accompanied by a new version of iCAVE with an enhanced user interface and highlights new features useful for existing users. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Selim Kalayci
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Zeynep H Gümüş
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
35
|
O'Donoghue SI, Baldi BF, Clark SJ, Darling AE, Hogan JM, Kaur S, Maier-Hein L, McCarthy DJ, Moore WJ, Stenau E, Swedlow JR, Vuong J, Procter JB. Visualization of Biomedical Data. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013424] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.
Collapse
Affiliation(s)
- Seán I. O'Donoghue
- Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Eveleigh NSW 2015, Australia
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW), Kensington NSW 2033, Australia
| | - Benedetta Frida Baldi
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
| | - Susan J. Clark
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
| | - Aaron E. Darling
- The ithree Institute, University of Technology Sydney, Ultimo NSW 2007, Australia
| | - James M. Hogan
- School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane QLD, 4000, Australia
| | - Sandeep Kaur
- School of Computer Science and Engineering, University of New South Wales (UNSW), Kensington NSW 2033, Australia
| | - Lena Maier-Hein
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Davis J. McCarthy
- European Bioinformatics Institute (EBI), European Molecular Biology Laboratory (EMBL), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- St. Vincent's Institute of Medical Research, Fitzroy VIC 3065, Australia
| | - William J. Moore
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Esther Stenau
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Jason R. Swedlow
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Jenny Vuong
- Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Eveleigh NSW 2015, Australia
| | - James B. Procter
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| |
Collapse
|
36
|
Waldispühl J, Zhang E, Butyaev A, Nazarova E, Cyr Y. Storage, visualization, and navigation of 3D genomics data. Methods 2018; 142:74-80. [PMID: 29792917 DOI: 10.1016/j.ymeth.2018.05.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 05/07/2018] [Accepted: 05/09/2018] [Indexed: 01/27/2023] Open
Abstract
The field of 3D genomics grew at increasing rates in the last decade. The volume and complexity of 2D and 3D data produced is progressively outpacing the capacities of the technology previously used for distributing genome sequences. The emergence of new technologies provides also novel opportunities for the development of innovative approaches. In this paper, we review the state-of-the-art computing technology, as well as the solutions adopted by the platforms currently available.
Collapse
Affiliation(s)
| | - Eric Zhang
- School of Computer Science, McGill University, Montréal, Canada
| | | | - Elena Nazarova
- School of Computer Science, McGill University, Montréal, Canada
| | - Yan Cyr
- Beam Me Up Labs, Montréal, Canada
| |
Collapse
|
37
|
Liluashvili V, Kalayci S, Fluder E, Wilson M, Gabow A, Gümüs ZH. iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D. Gigascience 2018; 6:1-13. [PMID: 28814063 PMCID: PMC5554349 DOI: 10.1093/gigascience/gix054] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 07/05/2017] [Indexed: 02/02/2023] Open
Abstract
Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field.
Collapse
Affiliation(s)
- Vaja Liluashvili
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Selim Kalayci
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eugene Fluder
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Manda Wilson
- Computational Biology Center, Memorial-Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Aaron Gabow
- Computational Biology Center, Memorial-Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Zeynep H Gümüs
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
38
|
Pavlopoulos GA, Kontou PI, Pavlopoulou A, Bouyioukos C, Markou E, Bagos PG. Bipartite graphs in systems biology and medicine: a survey of methods and applications. Gigascience 2018; 7:1-31. [PMID: 29648623 PMCID: PMC6333914 DOI: 10.1093/gigascience/giy014] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Revised: 01/15/2018] [Accepted: 02/13/2018] [Indexed: 11/14/2022] Open
Abstract
The latest advances in high-throughput techniques during the past decade allowed the systems biology field to expand significantly. Today, the focus of biologists has shifted from the study of individual biological components to the study of complex biological systems and their dynamics at a larger scale. Through the discovery of novel bioentity relationships, researchers reveal new information about biological functions and processes. Graphs are widely used to represent bioentities such as proteins, genes, small molecules, ligands, and others such as nodes and their connections as edges within a network. In this review, special focus is given to the usability of bipartite graphs and their impact on the field of network biology and medicine. Furthermore, their topological properties and how these can be applied to certain biological case studies are discussed. Finally, available methodologies and software are presented, and useful insights on how bipartite graphs can shape the path toward the solution of challenging biological problems are provided.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Lawrence Berkeley Labs, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Panagiota I Kontou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Athanasia Pavlopoulou
- Izmir International Biomedicine and Genome Institute (iBG-Izmir), Dokuz Eylül University, 35340, Turkey
| | - Costas Bouyioukos
- Université Paris Diderot, Sorbonne Paris Cité, Epigenetics and Cell Fate, UMR7216, CNRS, France
| | - Evripides Markou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Pantelis G Bagos
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| |
Collapse
|
39
|
Kelley JJ, Maor S, Kim MK, Lane A, Lun DS. MOST-visualization: software for producing automated textbook-style maps of genome-scale metabolic networks. Bioinformatics 2018; 33:2596-2597. [PMID: 28430868 DOI: 10.1093/bioinformatics/btx240] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 04/14/2017] [Indexed: 11/13/2022] Open
Abstract
Summary Visualization of metabolites, reactions and pathways in genome-scale metabolic networks (GEMs) can assist in understanding cellular metabolism. Three attributes are desirable in software used for visualizing GEMs: (i) automation, since GEMs can be quite large; (ii) production of understandable maps that provide ease in identification of pathways, reactions and metabolites; and (iii) visualization of the entire network to show how pathways are interconnected. No software currently exists for visualizing GEMs that satisfies all three characteristics, but MOST-Visualization, an extension of the software package MOST (Metabolic Optimization and Simulation Tool), satisfies (i), and by using a pre-drawn overview map of metabolism based on the Roche map satisfies (ii) and comes close to satisfying (iii). Availability and Implementation MOST is distributed for free on the GNU General Public License. The software and full documentation are available at http://most.ccib.rutgers.edu/. Contact dslun@rutgers.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- James J Kelley
- Department of Computer Science, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Shay Maor
- Department of Computer Science, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Min Kyung Kim
- Department of Computer Science, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Anatoliy Lane
- Department of Computer Science, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Desmond S Lun
- Department of Computer Science, Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA.,Department of Plant Biology and Pathology, Rutgers University, New Brunswick, NJ 08901, USA
| |
Collapse
|
40
|
Nagar SD, Aggarwal B, Joon S, Bhatnagar R, Bhatnagar S. A Network Biology Approach to Decipher Stress Response in Bacteria Using Escherichia coli As a Model. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2018; 20:310-24. [PMID: 27195968 DOI: 10.1089/omi.2016.0028] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The development of drug-resistant pathogenic bacteria poses challenges to global health for their treatment and control. In this context, stress response enables bacterial populations to survive extreme perturbations in the environment but remains poorly understood. Specific modules are activated for unique stressors with few recognized global regulators. The phenomenon of cross-stress protection strongly suggests the presence of central proteins that control the diverse stress responses. In this work, Escherichia coli was used to model the bacterial stress response. A Protein-Protein Interaction Network was generated by integrating differentially expressed genes in eight stress conditions of pH, temperature, and antibiotics with relevant gene ontology terms. Topological analysis identified 24 central proteins. The well-documented role of 16 central proteins in stress indicates central control of the response, while the remaining eight proteins may have a novel role in stress response. Cluster analysis of the generated network implicated RNA binding, flagellar assembly, ABC transporters, and DNA repair as important processes during response to stress. Pathway analysis showed crosstalk of Two Component Systems with metabolic processes, oxidative phosphorylation, and ABC transporters. The results were further validated by analysis of an independent cross-stress protection dataset. This study also reports on the ways in which bacterial stress response can progress to biofilm formation. In conclusion, we suggest that drug targets or pathways disrupting bacterial stress responses can potentially be exploited to combat antibiotic tolerance and multidrug resistance in the future.
Collapse
Affiliation(s)
- Shashwat Deepali Nagar
- 1 Computational and Structural Biology Laboratory, Division of Biotechnology, Netaji Subhas Institute of Technology , New Delhi, India
| | - Bhavye Aggarwal
- 1 Computational and Structural Biology Laboratory, Division of Biotechnology, Netaji Subhas Institute of Technology , New Delhi, India
| | - Shikha Joon
- 1 Computational and Structural Biology Laboratory, Division of Biotechnology, Netaji Subhas Institute of Technology , New Delhi, India .,2 Laboratory of Molecular Biology and Genetic Engineering, School of Biotechnology, Jawaharlal Nehru University , New Delhi, India
| | - Rakesh Bhatnagar
- 2 Laboratory of Molecular Biology and Genetic Engineering, School of Biotechnology, Jawaharlal Nehru University , New Delhi, India
| | - Sonika Bhatnagar
- 1 Computational and Structural Biology Laboratory, Division of Biotechnology, Netaji Subhas Institute of Technology , New Delhi, India
| |
Collapse
|
41
|
Dangi AK, Dubey KK, Shukla P. Strategies to Improve Saccharomyces cerevisiae: Technological Advancements and Evolutionary Engineering. Indian J Microbiol 2017; 57:378-386. [PMID: 29151637 PMCID: PMC5671434 DOI: 10.1007/s12088-017-0679-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 09/30/2017] [Indexed: 11/28/2022] Open
Abstract
Bakery industries are thriving to augment the diverse properties of Saccharomyces cerevisiae to increase its flavor, texture and nutritional parameters to attract the more consumers. The improved technologies adopted for quality improvement of baker's yeast are attracting the attention of industry and it is playing a pivotal role in redesigning the quality parameters. Modern yeast strain improvement tactics revolve around the use of several advanced technologies such as evolutionary engineering, systems biology, metabolic engineering, genome editing. The review mainly deals with the technologies for improving S. cerevisiae, with the objective of broadening the range of its industrial applications.
Collapse
Affiliation(s)
- Arun Kumar Dangi
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, 124001 India
| | - Kashyap Kumar Dubey
- Department of Biotechnology, Central University of Haryana, Mahendergarh, India
| | - Pratyoosh Shukla
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, 124001 India
| |
Collapse
|
42
|
Mougin F, Auber D, Bourqui R, Diallo G, Dutour I, Jouhet V, Thiessard F, Thiébaut R, Thébault P. Visualizing omics and clinical data: Which challenges for dealing with their variety? Methods 2017; 132:3-18. [PMID: 28887085 DOI: 10.1016/j.ymeth.2017.08.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Revised: 08/22/2017] [Accepted: 08/23/2017] [Indexed: 12/20/2022] Open
Abstract
Life sciences are currently going through a great number of transformations raised by the in-going revolution in high-throughput technologies for the acquisition of data. The integration of their high dimensionality, ranging from omics to clinical data, is becoming one of the most challenging stages. It involves inter-disciplinary developments with the aim to move towards an enhanced understanding of human physiology for caring purposes. Biologists, bioinformaticians, physicians and other experts related to the healthcare domain have to accompany each step of the analysis process in order to investigate and expertise these various data. In this perspective, methods related to information visualization are gaining increasing attention within life sciences. The softwares based on these methods are now well recognized to facilitate expert users' success in carrying out their data analysis tasks. This article aims at reviewing the current methods and techniques dedicated to information visualisation and their current use in software development related to omics or/and clinical data.
Collapse
Affiliation(s)
- Fleur Mougin
- Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, Team ERIAS, F-33000 Bordeaux, France; Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France.
| | - David Auber
- Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France
| | - Romain Bourqui
- Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France
| | - Gayo Diallo
- Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, Team ERIAS, F-33000 Bordeaux, France; Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France
| | - Isabelle Dutour
- Univ. Bordeaux, CNRS UMR 5800, LaBRI, F-33000 Bordeaux, France
| | - Vianney Jouhet
- Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, Team ERIAS, F-33000 Bordeaux, France; CHU de Bordeaux, Pole de sante publique, Service d'information medicale, F-33000 Bordeaux, France
| | - Frantz Thiessard
- Univ. Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, Team ERIAS, F-33000 Bordeaux, France; CHU de Bordeaux, Pole de sante publique, Service d'information medicale, F-33000 Bordeaux, France
| | - Rodolphe Thiébaut
- Univ. Bordeaux, Inserm UMR 1219, INRIA SISTM, F-33000 Bordeaux, France; CHU de Bordeaux, Pole de sante publique, Service d'information medicale, F-33000 Bordeaux, France.
| | | |
Collapse
|
43
|
Goodstadt M, Marti‐Renom MA. Challenges for visualizing three-dimensional data in genomic browsers. FEBS Lett 2017; 591:2505-2519. [PMID: 28771695 PMCID: PMC5638070 DOI: 10.1002/1873-3468.12778] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 07/30/2017] [Accepted: 07/31/2017] [Indexed: 12/14/2022]
Abstract
Genomic interactions reveal the spatial organization of genomes and genomic domains, which is known to play key roles in cell function. Physical proximity can be represented as two-dimensional heat maps or matrices. From these, three-dimensional (3D) conformations of chromatin can be computed revealing coherent structures that highlight the importance of nonsequential relationships across genomic features. Mainstream genomic browsers have been classically developed to display compact, stacked tracks based on a linear, sequential, per-chromosome coordinate system. Genome-wide comparative analysis demands new approaches to data access and new layouts for analysis. The legibility can be compromised when displaying track-aligned second dimension matrices, which require greater screen space. Moreover, 3D representations of genomes defy vertical alignment in track-based genome browsers. Furthermore, investigation at previously unattainable levels of detail is revealing multiscale, multistate, time-dependent complexity. This article outlines how these challenges are currently handled in mainstream browsers as well as how novel techniques in visualization are being explored to address them. A set of requirements for coherent visualization of novel spatial genomic data is defined and the resulting potential for whole genome visualization is described.
Collapse
Affiliation(s)
- Mike Goodstadt
- Structural Genomics GroupCNAG‐CRGThe Barcelona Institute of Science and Technology (BIST)Spain
- Gene Regulation, Stem Cells and Cancer ProgramCentre for Genomic Regulation (CRG)The Barcelona Institute of Science and Technology (BIST)Spain
- Universitat Pompeu Fabra (UPF)BarcelonaSpain
| | - Marc A. Marti‐Renom
- Structural Genomics GroupCNAG‐CRGThe Barcelona Institute of Science and Technology (BIST)Spain
- Gene Regulation, Stem Cells and Cancer ProgramCentre for Genomic Regulation (CRG)The Barcelona Institute of Science and Technology (BIST)Spain
- Universitat Pompeu Fabra (UPF)BarcelonaSpain
- Institució Catalana de Recerca i Estudis Avançats (ICREA)BarcelonaSpain
| |
Collapse
|
44
|
Empirical Comparison of Visualization Tools for Larger-Scale Network Analysis. Adv Bioinformatics 2017; 2017:1278932. [PMID: 28804499 PMCID: PMC5540468 DOI: 10.1155/2017/1278932] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 05/14/2017] [Accepted: 06/04/2017] [Indexed: 12/19/2022] Open
Abstract
Gene expression, signal transduction, protein/chemical interactions, biomedical literature cooccurrences, and other concepts are often captured in biological network representations where nodes represent a certain bioentity and edges the connections between them. While many tools to manipulate, visualize, and interactively explore such networks already exist, only few of them can scale up and follow today's indisputable information growth. In this review, we shortly list a catalog of available network visualization tools and, from a user-experience point of view, we identify four candidate tools suitable for larger-scale network analysis, visualization, and exploration. We comment on their strengths and their weaknesses and empirically discuss their scalability, user friendliness, and postvisualization capabilities.
Collapse
|
45
|
Theodosiou T, Efstathiou G, Papanikolaou N, Kyrpides NC, Bagos PG, Iliopoulos I, Pavlopoulos GA. NAP: The Network Analysis Profiler, a web tool for easier topological analysis and comparison of medium-scale biological networks. BMC Res Notes 2017; 10:278. [PMID: 28705239 PMCID: PMC5513407 DOI: 10.1186/s13104-017-2607-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Accepted: 07/07/2017] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE Nowadays, due to the technological advances of high-throughput techniques, Systems Biology has seen a tremendous growth of data generation. With network analysis, looking at biological systems at a higher level in order to better understand a system, its topology and the relationships between its components is of a great importance. Gene expression, signal transduction, protein/chemical interactions, biomedical literature co-occurrences, are few of the examples captured in biological network representations where nodes represent certain bioentities and edges represent the connections between them. Today, many tools for network visualization and analysis are available. Nevertheless, most of them are standalone applications that often (i) burden users with computing and calculation time depending on the network's size and (ii) focus on handling, editing and exploring a network interactively. While such functionality is of great importance, limited efforts have been made towards the comparison of the topological analysis of multiple networks. RESULTS Network Analysis Provider (NAP) is a comprehensive web tool to automate network profiling and intra/inter-network topology comparison. It is designed to bridge the gap between network analysis, statistics, graph theory and partially visualization in a user-friendly way. It is freely available and aims to become a very appealing tool for the broader community. It hosts a great plethora of topological analysis methods such as node and edge rankings. Few of its powerful characteristics are: its ability to enable easy profile comparisons across multiple networks, find their intersection and provide users with simplified, high quality plots of any of the offered topological characteristics against any other within the same network. It is written in R and Shiny, it is based on the igraph library and it is able to handle medium-scale weighted/unweighted, directed/undirected and bipartite graphs. NAP is available at http://bioinformatics.med.uoc.gr/NAP .
Collapse
Affiliation(s)
- Theodosios Theodosiou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete Medical School, 70013 Heraklion, Crete, Greece
| | - Georgios Efstathiou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete Medical School, 70013 Heraklion, Crete, Greece
| | - Nikolas Papanikolaou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete Medical School, 70013 Heraklion, Crete, Greece
| | - Nikos C Kyrpides
- Joint Genome Institute, Lawrence Berkeley Lab, United States Department of Energy, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Papasiopoulou 2-4, Galaneika, 35100, Lamia, Greece
| | - Ioannis Iliopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete Medical School, 70013 Heraklion, Crete, Greece.
| | - Georgios A Pavlopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete Medical School, 70013 Heraklion, Crete, Greece. .,Joint Genome Institute, Lawrence Berkeley Lab, United States Department of Energy, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| |
Collapse
|
46
|
Khomtchouk BB, Hennessy JR, Wahlestedt C. shinyheatmap: Ultra fast low memory heatmap web interface for big data genomics. PLoS One 2017; 12:e0176334. [PMID: 28493881 PMCID: PMC5426587 DOI: 10.1371/journal.pone.0176334] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2016] [Accepted: 04/10/2017] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Transcriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps. RESULTS We propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105-107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed. CONCLUSIONS shinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.
Collapse
Affiliation(s)
- Bohdan B. Khomtchouk
- Center for Therapeutic Innovation, University of Miami Miller School of Medicine, 1501 NW 10th Ave., Miami, FL, 33136, United States of America
- Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, 33136, United States of America
| | - James R. Hennessy
- Department of Mathematics, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33146, United States of America
| | - Claes Wahlestedt
- Center for Therapeutic Innovation, University of Miami Miller School of Medicine, 1501 NW 10th Ave., Miami, FL, 33136, United States of America
- Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, 33136, United States of America
| |
Collapse
|
47
|
Mahfouz A, Huisman SMH, Lelieveldt BPF, Reinders MJT. Brain transcriptome atlases: a computational perspective. Brain Struct Funct 2017; 222:1557-1580. [PMID: 27909802 PMCID: PMC5406417 DOI: 10.1007/s00429-016-1338-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 11/15/2016] [Indexed: 01/31/2023]
Abstract
The immense complexity of the mammalian brain is largely reflected in the underlying molecular signatures of its billions of cells. Brain transcriptome atlases provide valuable insights into gene expression patterns across different brain areas throughout the course of development. Such atlases allow researchers to probe the molecular mechanisms which define neuronal identities, neuroanatomy, and patterns of connectivity. Despite the immense effort put into generating such atlases, to answer fundamental questions in neuroscience, an even greater effort is needed to develop methods to probe the resulting high-dimensional multivariate data. We provide a comprehensive overview of the various computational methods used to analyze brain transcriptome atlases.
Collapse
Affiliation(s)
- Ahmed Mahfouz
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands.
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands.
| | - Sjoerd M H Huisman
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| | - Boudewijn P F Lelieveldt
- Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| | - Marcel J T Reinders
- Delft Bioinformatics Laboratory, Delft University of Technology, Delft, The Netherlands
| |
Collapse
|
48
|
Chakraborty N, Meyerhoff J, Jett M, Hammamieh R. Genome to Phenome: A Systems Biology Approach to PTSD Using an Animal Model. Methods Mol Biol 2017; 1598:117-154. [PMID: 28508360 DOI: 10.1007/978-1-4939-6952-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Post-traumatic stress disorder (PTSD) is a debilitating illness that imposes significant emotional and financial burdens on military families. The understanding of PTSD etiology remains elusive; nonetheless, it is clear that PTSD is manifested by a cluster of symptoms including hyperarousal, reexperiencing of traumatic events, and avoidance of trauma reminders. With these characteristics in mind, several rodent models have been developed eliciting PTSD-like features. Animal models with social dimensions are of particular interest, since the social context plays a major role in the development and manifestation of PTSD.For civilians, a core trauma that elicits PTSD might be characterized by a singular life-threatening event such as a car accident. In contrast, among war veterans, PTSD might be triggered by repeated threats and a cumulative psychological burden that coalesced in the combat zone. In capturing this fundamental difference, the aggressor-exposed social stress (Agg-E SS) model imposes highly threatening conspecific trauma on naïve mice repeatedly and randomly.There is abundant evidence that suggests the potential role of genetic contributions to risk factors for PTSD. Specific observations include putatively heritable attributes of the disorder, the cited cases of atypical brain morphology, and the observed neuroendocrine shifts away from normative. Taken together, these features underscore the importance of multi-omics investigations to develop a comprehensive picture. More daunting will be the task of downstream analysis with integration of these heterogeneous genotypic and phenotypic data types to deliver putative clinical biomarkers. Researchers are advocating for a systems biology approach, which has demonstrated an increasingly robust potential for integrating multidisciplinary data. By applying a systems biology approach here, we have connected the tissue-specific molecular perturbations to the behaviors displayed by mice subjected to Agg-E SS. A molecular pattern that links the atypical fear plasticity to energy deficiency was thereby identified to be causally associated with many behavioral shifts and transformations.PTSD is a multifactorial illness sensitive to environmental influence. Accordingly, it is essential to employ the optimal animal model approximating the environmental condition that elicits PTSD-like symptoms. Integration of an optimal animal model with a systems biology approach can contribute to a more knowledge-driven and efficient next-generation care management system and, potentially, prevention of PTSD.
Collapse
Affiliation(s)
- Nabarun Chakraborty
- Integrative Systems Biology, Geneva Foundation, USACEHR, 568 Doughten Drive, Fredrick, MD, 21702-5010, USA
| | - James Meyerhoff
- Integrative Systems Biology, Geneva Foundation, USACEHR, 568 Doughten Drive, Fredrick, MD, 21702-5010, USA
| | - Marti Jett
- Integrative Systems Biology, US Army Center for Environmental Health Research, USACEHR, 568 Doughten Drive, Frederick, MD, 21702-5010, USA
| | - Rasha Hammamieh
- Integrative Systems Biology, US Army Center for Environmental Health Research, USACEHR, 568 Doughten Drive, Frederick, MD, 21702-5010, USA.
| |
Collapse
|
49
|
He Z, Zhang H, Gao S, Lercher MJ, Chen WH, Hu S. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res 2016; 44:W236-41. [PMID: 27131786 PMCID: PMC4987921 DOI: 10.1093/nar/gkw370] [Citation(s) in RCA: 404] [Impact Index Per Article: 50.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 04/23/2016] [Indexed: 02/06/2023] Open
Abstract
Evolview is an online visualization and management tool for customized and annotated phylogenetic trees. It allows users to visualize phylogenetic trees in various formats, customize the trees through built-in functions and user-supplied datasets and export the customization results to publication-ready figures. Its ‘dataset system’ contains not only the data to be visualized on the tree, but also ‘modifiers’ that control various aspects of the graphical annotation. Evolview is a single-page application (like Gmail); its carefully designed interface allows users to upload, visualize, manipulate and manage trees and datasets all in a single webpage. Developments since the last public release include a modern dataset editor with keyword highlighting functionality, seven newly added types of annotation datasets, collaboration support that allows users to share their trees and datasets and various improvements of the web interface and performance. In addition, we included eleven new ‘Demo’ trees to demonstrate the basic functionalities of Evolview, and five new ‘Showcase’ trees inspired by publications to showcase the power of Evolview in producing publication-ready figures. Evolview is freely available at: http://www.evolgenius.info/evolview/.
Collapse
Affiliation(s)
- Zilong He
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), No.7 Beitucheng West Road, Chaoyang District, 100029 Beijing, PR China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Huangkai Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), No.7 Beitucheng West Road, Chaoyang District, 100029 Beijing, PR China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shenghan Gao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), No.7 Beitucheng West Road, Chaoyang District, 100029 Beijing, PR China
| | - Martin J Lercher
- Institute for Computer Science and Cluster of Excellence on Plant Sciences CEPLAS, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Wei-Hua Chen
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), No.7 Beitucheng West Road, Chaoyang District, 100029 Beijing, PR China BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), No.7 Beitucheng West Road, Chaoyang District, 100029 Beijing, PR China
| | - Songnian Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), No.7 Beitucheng West Road, Chaoyang District, 100029 Beijing, PR China University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
50
|
Shifman AR, Johnson RM, Wilhelm BT. Cascade: an RNA-seq visualization tool for cancer genomics. BMC Genomics 2016; 17:75. [PMID: 26810393 PMCID: PMC4727405 DOI: 10.1186/s12864-016-2389-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 01/11/2016] [Indexed: 12/20/2022] Open
Abstract
Background Cancer genomics projects are producing ever-increasing amounts of rich and diverse data from patient samples. The ability to easily visualize this data in an integrated an intuitive way is currently limited by the current software available. As a result, users typically must use several different tools to view the different data types for their cohort, making it difficult to have a simple unified view of their data. Results Here we present Cascade, a novel web based tool for the intuitive 3D visualization of RNA-seq data from cancer genomics experiments. The Cascade viewer allows multiple data types (e.g. mutation, gene expression, alternative splicing frequency) to be simultaneously displayed, allowing a simplified view of the data in a way that is tuneable based on user specified parameters. The main webpage of Cascade provides a primary view of user data which is overlaid onto known biological pathways that are either predefined or added by users. A space-saving menu for data selection and parameter adjustment allows users to access an underlying MySQL database and customize the features presented in the main view. Conclusions There is currently a pressing need for new software tools to allow researchers to easily explore large cancer genomics datasets and generate hypotheses. Cascade represents a simple yet intuitive interface for data visualization that is both scalable and customizable. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2389-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aaron R Shifman
- Laboratory for high throughput genomics, Institute for Research in Immunology and Cancer, University of Montreal, Montreal, QC, Canada.
| | - Radia M Johnson
- Laboratory for high throughput genomics, Institute for Research in Immunology and Cancer, University of Montreal, Montreal, QC, Canada.
| | - Brian T Wilhelm
- Laboratory for high throughput genomics, Institute for Research in Immunology and Cancer, University of Montreal, Montreal, QC, Canada.
| |
Collapse
|