1
|
Xu X, Zhang S, Guo J, Xin T. Biclustering of Log Data: Insights from a Computer-Based Complex Problem Solving Assessment. J Intell 2024; 12:10. [PMID: 38248908 PMCID: PMC10817361 DOI: 10.3390/jintelligence12010010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 12/17/2023] [Accepted: 01/12/2024] [Indexed: 01/23/2024] Open
Abstract
Computer-based assessments provide the opportunity to collect a new source of behavioral data related to the problem-solving process, known as log file data. To understand the behavioral patterns that can be uncovered from these process data, many studies have employed clustering methods. In contrast to one-mode clustering algorithms, this study utilized biclustering methods, enabling simultaneous classification of test takers and features extracted from log files. By applying the biclustering algorithms to the "Ticket" task in the PISA 2012 CPS assessment, we evaluated the potential of biclustering algorithms in identifying and interpreting homogeneous biclusters from the process data. Compared with one-mode clustering algorithms, the biclustering methods could uncover clusters of individuals who are homogeneous on a subset of feature variables, holding promise for gaining fine-grained insights into students' problem-solving behavior patterns. Empirical results revealed that specific subsets of features played a crucial role in identifying biclusters. Additionally, the study explored the utilization of biclustering on both the action sequence data and timing data, and the inclusion of time-based features enhanced the understanding of students' action sequences and scores in the context of the analysis.
Collapse
Affiliation(s)
- Xin Xu
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing 100875, China;
| | - Susu Zhang
- Departments of Psychology and Statistics, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA;
| | - Jinxin Guo
- College of Science, Minzu University of China, Beijing 100081, China;
| | - Tao Xin
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing 100875, China;
- School of Educational Science, Anhui Normal University, Wuhu 241000, China
| |
Collapse
|
2
|
Castanho EN, Lobo JP, Henriques R, Madeira SC. G-bic: generating synthetic benchmarks for biclustering. BMC Bioinformatics 2023; 24:457. [PMID: 38053078 PMCID: PMC10698934 DOI: 10.1186/s12859-023-05587-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 11/28/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When assessing the performance of these algorithms, more than real datasets are required as they do not offer a solid ground truth. Synthetic data surpass this limitation by producing reference solutions to be compared with the found patterns. However, generating synthetic datasets is challenging since the generated data must ensure reproducibility, pattern representativity, and real data resemblance. RESULTS We propose G-Bic, a dataset generator conceived to produce synthetic benchmarks for the normative assessment of biclustering algorithms. Beyond expanding on aspects of pattern coherence, data quality, and positioning properties, it further handles specificities related to mixed-type datasets and time-series data.G-Bic has the flexibility to replicate real data regularities from diverse domains. We provide the default configurations to generate reproducible benchmarks to evaluate and compare diverse aspects of biclustering algorithms. Additionally, we discuss empirical strategies to simulate the properties of real data. CONCLUSION G-Bic is a parametrizable generator for biclustering analysis, offering a solid means to assess biclustering solutions according to internal and external metrics robustly.
Collapse
Affiliation(s)
- Eduardo N Castanho
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 016, 1749-016, Lisbon, Portugal.
| | - João P Lobo
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 016, 1749-016, Lisbon, Portugal
| | - Rui Henriques
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1900-001, Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 016, 1749-016, Lisbon, Portugal
| |
Collapse
|
3
|
Abstract
Sensors deployed within water distribution systems collect consumption data that enable the application of data analysis techniques to extract essential information. Time series clustering has been traditionally applied for modeling end-user water consumption profiles to aid water management. However, its effectiveness is limited by the diversity and local nature of consumption patterns. In addition, existing techniques cannot adequately handle changes in household composition, disruptive events (e.g., vacations), and consumption dynamics at different time scales. In this context, biclustering approaches provide a natural alternative to detect groups of end-users with coherent consumption profiles during local time periods while addressing the aforementioned limitations. This work discusses when, why and how to apply biclustering techniques for water consumption data analysis, and further proposes a methodology to this end. To the best of our knowledge, this is the first work introducing biclustering to water consumption data analysis. Results on data from a real-world water distribution system—Quinta do Lago, Portugal—confirm the potentialities of the proposed approach for pattern discovery with guarantees of statistical significance and robustness that entities can rely on for strategic planning.
Collapse
|
4
|
Husaini AM, Haq SAU, Jiménez AJL. Understanding saffron biology using omics- and bioinformatics tools: stepping towards a better Crocus phenome. Mol Biol Rep 2022; 49:5325-5340. [PMID: 35106686 PMCID: PMC8807023 DOI: 10.1007/s11033-021-07053-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/06/2021] [Indexed: 12/13/2022]
Abstract
Saffron is a unique plant in many aspects, and its cellular processes are regulated at multiple levels. The genetic makeup in the form of eight chromosome triplets (2n = 3x = 24) with a haploid genetic content (genome size) of 3.45 Gbp is decoded into different types of RNA by transcription. The RNA then translates into peptides and functional proteins, sometimes involving post-translational modifications too. The interactions of the genome, transcriptome, proteome and other regulatory molecules ultimately result in the complex set of primary and secondary metabolites of saffron metabolome. These complex interactions manifest in the form of a set of traits 'phenome' peculiar to saffron. The phenome responds to the environmental changes occurring in and around saffron and modify its response in respect of growth, development, disease response, stigma quality, apocarotenoid biosynthesis, and other processes. Understanding these complex relations between different yet interconnected biological activities is quite challenging in saffron where classical genetics has a very limited role owing to its sterility, and the absence of a whole-genome sequence. Omics-based technologies are immensely helpful in overcoming these limitations and developing a better understanding of saffron biology. In addition to creating a comprehensive picture of the molecular mechanisms involved in apocarotenoid synthesis, stigma biogenesis, corm activity, and flower development, omics-technologies will ultimately lead to the engineering of saffron plants with improved phenome.
Collapse
Affiliation(s)
- Amjad M Husaini
- Genome Engineering and Societal Biotechnology Lab, Division of Plant Biotechnology, Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir, Shalimar Campus, Srinagar, Jammu and Kashmir, 190025, India.
| | - Syed Anam Ul Haq
- Genome Engineering and Societal Biotechnology Lab, Division of Plant Biotechnology, Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir, Shalimar Campus, Srinagar, Jammu and Kashmir, 190025, India
| | - Alberto José López Jiménez
- Departamento de Ciencia y Tecnología Agroforestal y Genética, Escuela Técnica Superior de Ingenieros Agrónomos y de Montes, Universidad de Castilla-La Mancha, Albacete, Spain
| |
Collapse
|
5
|
Chang H, Zhang H, Zhang T, Su L, Qin QM, Li G, Li X, Wang L, Zhao T, Zhao E, Zhao H, Liu Y, Stacey G, Xu D. A Multi-Level Iterative Bi-Clustering Method for Discovering miRNA Co-regulation Network of Abiotic Stress Tolerance in Soybeans. FRONTIERS IN PLANT SCIENCE 2022; 13:860791. [PMID: 35463453 PMCID: PMC9021755 DOI: 10.3389/fpls.2022.860791] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 02/24/2022] [Indexed: 06/14/2023]
Abstract
Although growing evidence shows that microRNA (miRNA) regulates plant growth and development, miRNA regulatory networks in plants are not well understood. Current experimental studies cannot characterize miRNA regulatory networks on a large scale. This information gap provides an excellent opportunity to employ computational methods for global analysis and generate valuable models and hypotheses. To address this opportunity, we collected miRNA-target interactions (MTIs) and used MTIs from Arabidopsis thaliana and Medicago truncatula to predict homologous MTIs in soybeans, resulting in 80,235 soybean MTIs in total. A multi-level iterative bi-clustering method was developed to identify 483 soybean miRNA-target regulatory modules (MTRMs). Furthermore, we collected soybean miRNA expression data and corresponding gene expression data in response to abiotic stresses. By clustering these data, 37 MTRMs related to abiotic stresses were identified, including stress-specific MTRMs and shared MTRMs. These MTRMs have gene ontology (GO) enrichment in resistance response, iron transport, positive growth regulation, etc. Our study predicts soybean MTRMs and miRNA-GO networks under different stresses, and provides miRNA targeting hypotheses for experimental analyses. The method can be applied to other biological processes and other plants to elucidate miRNA co-regulation mechanisms.
Collapse
Affiliation(s)
- Haowu Chang
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Hao Zhang
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Tianyue Zhang
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
| | - Lingtao Su
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
| | - Qing-Ming Qin
- College of Plant Sciences and Key Laboratory of Zoonosis Research, Ministry of Education, Jilin University, Jilin, China
| | - Guihua Li
- College of Plant Sciences and Key Laboratory of Zoonosis Research, Ministry of Education, Jilin University, Jilin, China
| | - Xueqing Li
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
| | - Li Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
| | - Tianheng Zhao
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
| | - Enshuang Zhao
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
| | - Hengyi Zhao
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
| | - Yuanning Liu
- Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Ministry of Education, Jilin University, Jilin, China
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Gary Stacey
- Division of Plant Sciences and Technology, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Dong Xu
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| |
Collapse
|
6
|
Prioritizing disease biomarkers using functional module based network analysis: A multilayer consensus driven scheme. Comput Biol Med 2020; 126:104023. [DOI: 10.1016/j.compbiomed.2020.104023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 09/24/2020] [Accepted: 09/26/2020] [Indexed: 12/19/2022]
|
7
|
Xie J, Ma A, Fennell A, Ma Q, Zhao J. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform 2020; 20:1449-1464. [PMID: 29490019 DOI: 10.1093/bib/bby014] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 01/16/2018] [Indexed: 12/12/2022] Open
Abstract
Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.
Collapse
|
8
|
Singh S, Singh VK, Rai G. Identification of Differentially Expressed Hematopoiesis-associated Genes in Term Low Birth Weight Newborns by Systems Genomics Approach. Curr Genomics 2020; 20:469-482. [PMID: 32655286 PMCID: PMC7327969 DOI: 10.2174/1389202920666191203123025] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 11/29/2019] [Accepted: 11/29/2019] [Indexed: 11/22/2022] Open
Abstract
Background Low Birth Weight (LBW) (birth weight <2.5 Kg) newborns are associated with a high risk of infection, morbidity and mortality during their perinatal period. Compromised innate immune responses and inefficient hematopoietic differentiation in term LBW newborns led us to evaluate the gene expression status of hematopoiesis. Materials and Methods In this study, we compared our microarray datasets of LBW-Normal Birth Weight (NBW) newborns with two reference datasets to identify hematopoietic stem cells genes, and their differential expression in the LBW newborns, by hierarchical clustering algorithm using gplots and RcolorBrewer package in R. Results Comparative analysis revealed 108 differentially expressed hematopoiesis genes (DEHGs), of which 79 genes were up-regulated, and 29 genes were down-regulated in LBW newborns compared to their NBW counterparts. Moreover, protein-protein interactions, functional annotation and pathway analysis demonstrated that the up-regulated genes were mainly involved in cell proliferation and differentiation, MAPK signaling and Rho GTPases signaling, and the down-regulated genes were engaged in cell proliferation and regulation, immune system regulation, hematopoietic cell lineage and JAK-STAT pathway. The binding of down-regulated genes (LYZ and GBP1) with growth factor GM-CSF using docking and MD simulation techniques, indicated that GM-CSF has the potential to alleviate the repressed hematopoiesis in the term LBW newborns. Conclusion Our study revealed that DEHGs belonged to erythroid and myeloid-specific lineages and may serve as potential targets for improving hematopoiesis in term LBW newborns to help build up their weak immune defense against life-threatening infections.
Collapse
Affiliation(s)
- Sakshi Singh
- 1Department of Molecular and Human Genetics, Institute of Science, Banaras Hindu University, Varanasi, India; 2Centre for Bioinformatics, School of Biotechnology, Institute of Science, Banaras Hindu University, Varanasi, India
| | - Vinay K Singh
- 1Department of Molecular and Human Genetics, Institute of Science, Banaras Hindu University, Varanasi, India; 2Centre for Bioinformatics, School of Biotechnology, Institute of Science, Banaras Hindu University, Varanasi, India
| | - Geeta Rai
- 1Department of Molecular and Human Genetics, Institute of Science, Banaras Hindu University, Varanasi, India; 2Centre for Bioinformatics, School of Biotechnology, Institute of Science, Banaras Hindu University, Varanasi, India
| |
Collapse
|
9
|
Sun M, Zhao J, Wu H, Luther K, North C, Ramakrishnan N. The Effect of Edge Bundling and Seriation on Sensemaking of Biclusters in Bipartite Graphs. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2983-2998. [PMID: 30059310 DOI: 10.1109/tvcg.2018.2861397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Exploring coordinated relationships (e.g., shared relationships between two sets of entities) is an important analytics task in a variety of real-world applications, such as discovering similarly behaved genes in bioinformatics, detecting malware collusions in cyber security, and identifying products bundles in marketing analysis. Coordinated relationships can be formalized as biclusters. In order to support visual exploration of biclusters, bipartite graphs based visualizations have been proposed, and edge bundling is used to show biclusters. However, it suffers from edge crossings due to possible overlaps of biclusters, and lacks in-depth understanding of its impact on user exploring biclusters in bipartite graphs. To address these, we propose a novel bicluster-based seriation technique that can reduce edge crossings in bipartite graphs drawing and conducted a user experiment to study the effect of edge bundling and this proposed technique on visualizing biclusters in bipartite graphs. We found that they both had impact on reducing entity visits for users exploring biclusters, and edge bundles helped them find more justified answers. Moreover, we identified four key trade-offs that inform the design of future bicluster visualizations. The study results suggest that edge bundling is critical for exploring biclusters in bipartite graphs, which helps to reduce low-level perceptual problems and support high-level inferences.
Collapse
|
10
|
Cruz A, Arrais JP, Machado P. Interactive and coordinated visualization approaches for biological data analysis. Brief Bioinform 2019; 20:1513-1523. [PMID: 29590305 DOI: 10.1093/bib/bby019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 01/24/2018] [Indexed: 12/11/2022] Open
Abstract
The field of computational biology has become largely dependent on data visualization tools to analyze the increasing quantities of data gathered through the use of new and growing technologies. Aside from the volume, which often results in large amounts of noise and complex relationships with no clear structure, the visualization of biological data sets is hindered by their heterogeneity, as data are obtained from different sources and contain a wide variety of attributes, including spatial and temporal information. This requires visualization approaches that are able to not only represent various data structures simultaneously but also provide exploratory methods that allow the identification of meaningful relationships that would not be perceptible through data analysis algorithms alone. In this article, we present a survey of visualization approaches applied to the analysis of biological data. We focus on graph-based visualizations and tools that use coordinated multiple views to represent high-dimensional multivariate data, in particular time series gene expression, protein-protein interaction networks and biological pathways. We then discuss how these methods can be used to help solve the current challenges surrounding the visualization of complex biological data sets.
Collapse
Affiliation(s)
- António Cruz
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| | - Joel P Arrais
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| | - Penousal Machado
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| |
Collapse
|
11
|
Williams JR, Yang R, Clifford JL, Watson D, Campbell R, Getnet D, Kumar R, Hammamieh R, Jett M. Functional Heatmap: an automated and interactive pattern recognition tool to integrate time with multi-omics assays. BMC Bioinformatics 2019; 20:81. [PMID: 30770734 PMCID: PMC6377781 DOI: 10.1186/s12859-019-2657-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 01/28/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Life science research is moving quickly towards large-scale experimental designs that are comprised of multiple tissues, time points, and samples. Omic time-series experiments offer answers to three big questions: what collective patterns do most analytes follow, which analytes follow an identical pattern or synchronize across multiple cohorts, and how do biological functions evolve over time. Existing tools fall short of robustly answering and visualizing all three questions in a unified interface. RESULTS Functional Heatmap offers time-series data visualization through a Master Panel page, and Combined page to answer each of the three time-series questions. It dissects the complex multi-omics time-series readouts into patterned clusters with associated biological functions. It allows users to identify a cascade of functional changes over a time variable. Inversely, Functional Heatmap can compare a pattern with specific biology respond to multiple experimental conditions. All analyses are interactive, searchable, and exportable in a form of heatmap, line-chart, or text, and the results are easy to share, maintain, and reproduce on the web platform. CONCLUSIONS Functional Heatmap is an automated and interactive tool that enables pattern recognition in time-series multi-omics assays. It significantly reduces the manual labour of pattern discovery and comparison by transferring statistical models into visual clues. The new pattern recognition feature will help researchers identify hidden trends driven by functional changes using multi-tissues/conditions on a time-series fashion from omic assays.
Collapse
Affiliation(s)
- Joshua R. Williams
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21702-5010 USA
- Integrative Systems Biology Program, US Army Center for Environmental Health Research, Fort Detrick, Frederick, MD 21702-5010 USA
| | - Ruoting Yang
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21702-5010 USA
- Integrative Systems Biology Program, US Army Center for Environmental Health Research, Fort Detrick, Frederick, MD 21702-5010 USA
| | - John L. Clifford
- Integrative Systems Biology Program, US Army Center for Environmental Health Research, Fort Detrick, Frederick, MD 21702-5010 USA
| | - Daniel Watson
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21702-5010 USA
| | - Ross Campbell
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21702-5010 USA
- Integrative Systems Biology Program, US Army Center for Environmental Health Research, Fort Detrick, Frederick, MD 21702-5010 USA
| | - Derese Getnet
- Integrative Systems Biology Program, US Army Center for Environmental Health Research, Fort Detrick, Frederick, MD 21702-5010 USA
| | - Raina Kumar
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21702-5010 USA
- Integrative Systems Biology Program, US Army Center for Environmental Health Research, Fort Detrick, Frederick, MD 21702-5010 USA
| | - Rasha Hammamieh
- Integrative Systems Biology Program, US Army Center for Environmental Health Research, Fort Detrick, Frederick, MD 21702-5010 USA
| | - Marti Jett
- Integrative Systems Biology Program, US Army Center for Environmental Health Research, Fort Detrick, Frederick, MD 21702-5010 USA
| |
Collapse
|
12
|
Janani S, Ramyachitra D, Ranjani Rani R. PCD-DPPI: Protein complex detection from dynamic PPI using shuffled frog-leaping algorithm. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.06.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
13
|
Structural and functional dissection of differentially expressed tomato WRKY transcripts in host defense response against the vascular wilt pathogen (Fusarium oxysporum f. sp. lycopersici). PLoS One 2018; 13:e0193922. [PMID: 29709017 PMCID: PMC5927432 DOI: 10.1371/journal.pone.0193922] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 02/21/2018] [Indexed: 11/24/2022] Open
Abstract
The WRKY transcription factors have indispensable role in plant growth, development and defense responses. The differential expression of WRKY genes following the stress conditions has been well demonstrated. We investigated the temporal and tissue-specific (root and leaf tissues) differential expression of plant defense-related WRKY genes, following the infection of Fusarium oxysporum f. sp. lycopersici (Fol) in tomato. The genome-wide computational analysis revealed that during the Fol infection in tomato, 16 different members of WRKY gene superfamily were found to be involved, of which only three WRKYs (SolyWRKY4, SolyWRKY33, and SolyWRKY37) were shown to have clear-cut differential gene expression. The quantitative real time PCR (qRT-PCR) studies revealed different gene expression profile changes in tomato root and leaf tissues. In root tissues, infected with Fol, an increased expression for SolyWRKY33 (2.76 fold) followed by SolyWRKY37 (1.93 fold) gene was found at 24 hrs which further increased at 48 hrs (5.0 fold). In contrast, the leaf tissues, the expression was more pronounced at an earlier stage of infection (24 hrs). However, in both cases, we found repression of SolyWRKY4 gene, which further decreased at an increased time interval. The biochemical defense programming against Fol pathogenesis was characterized by the highest accumulation of H2O2 (at 48 hrs) and enhanced lignification. The functional diversity across the characterized WRKYs was explored through motif scanning using MEME suite, and the WRKYs specific gene regulation was assessed through the DNA protein docking studies The functional WRKY domain modeled had β sheets like topology with coil and turns. The DNA-protein interaction results revealed the importance of core residues (Tyr, Arg, and Lys) in a feasible WRKY-W-box DNA interaction. The protein interaction network analysis revealed that the SolyWRKY33 could interact with other proteins, such as mitogen-activated protein kinase 5 (MAPK), sigma factor binding protein1 (SIB1) and with other WRKY members including WRKY70, WRKY1, and WRKY40, to respond various biotic and abiotic stresses. The STRING results were further validated through Predicted Tomato Interactome Resource (PTIR) database. The CELLO2GO web server revealed the functional gene ontology annotation and protein subcellular localization, which predicted that SolyWRKY33 is involved in amelioration of biological stress (39.3%) and other metabolic processes (39.3%). The protein (SolyWRKY33) most probably located inside the nucleus (91.3%) with having transcription factor binding activity. We conclude that the defense response following the Fol challenge was accompanied by differential expression of the SolyWRKY4(↓), SolyWRKY33(↑) and SolyWRKY37(↑) transcripts. The biochemical changes are occupied by elicitation of H2O2 generation and accumulation and enhanced lignified tissues.
Collapse
|
14
|
Martella F, Alfò M. A finite mixture approach to joint clustering of individuals and multivariate discrete outcomes. J STAT COMPUT SIM 2017. [DOI: 10.1080/00949655.2017.1322593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Francesca Martella
- Dipartimento di Scienze Statistiche, Sapienza Università di Roma, Rome, Italy
| | - Marco Alfò
- Dipartimento di Scienze Statistiche, Sapienza Università di Roma, Rome, Italy
| |
Collapse
|
15
|
Abstract
Mining microarray data to unearth interesting expression profile patterns for discovery of in silico biological knowledge is an emerging area of research in computational biology. A group of functionally related genes may have similar expression patterns under a set of conditions or at some time points. Biclustering is an important data mining tool that has been successfully used to analyze gene expression data for biologically significant cluster discovery. The purpose of this chapter is to introduce interesting patterns that may be observed in expression data and discuss the role of biclustering techniques in detecting interesting functional gene groups with similar expression patterns.
Collapse
|
16
|
Lakizadeh A, Jalili S. BiCAMWI: A Genetic-Based Biclustering Algorithm for Detecting Dynamic Protein Complexes. PLoS One 2016; 11:e0159923. [PMID: 27462706 PMCID: PMC4963120 DOI: 10.1371/journal.pone.0159923] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Accepted: 07/11/2016] [Indexed: 01/08/2023] Open
Abstract
Considering the roles of protein complexes in many biological processes in the cell, detection of protein complexes from available protein-protein interaction (PPI) networks is a key challenge in the post genome era. Despite high dynamicity of cellular systems and dynamic interaction between proteins in a cell, most computational methods have focused on static networks which cannot represent the inherent dynamicity of protein interactions. Recently, some researchers try to exploit the dynamicity of PPI networks by constructing a set of dynamic PPI subnetworks correspondent to each time-point (column) in a gene expression data. However, many genes can participate in multiple biological processes and cellular processes are not necessarily related to every sample, but they might be relevant only for a subset of samples. So, it is more interesting to explore each subnetwork based on a subset of genes and conditions (i.e., biclusters) in a gene expression data. Here, we present a new method, called BiCAMWI to employ dynamicity in detecting protein complexes. The preprocessing phase of the proposed method is based on a novel genetic algorithm that extracts some sets of genes that are co-regulated under some conditions from input gene expression data. Each extracted gene set is called bicluster. In the detection phase of the proposed method, then, based on the biclusters, some dynamic PPI subnetworks are extracted from input static PPI network. Protein complexes are identified by applying a detection method on each dynamic PPI subnetwork and aggregating the results. Experimental results confirm that BiCAMWI effectively models the dynamicity inherent in static PPI networks and achieves significantly better results than state-of-the-art methods. So, we suggest BiCAMWI as a more reliable method for protein complex detection.
Collapse
Affiliation(s)
- Amir Lakizadeh
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| | - Saeed Jalili
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
17
|
Tu X, Wang Y, Zhang M, Wu J. Using Formal Concept Analysis to Identify Negative Correlations in Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:380-391. [PMID: 27045834 DOI: 10.1109/tcbb.2015.2443805] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Recently, many biological studies reported that two groups of genes tend to show negatively correlated or opposite expression tendency in many biological processes or pathways. The negative correlation between genes may imply an important biological mechanism. In this study, we proposed a FCA-based negative correlation algorithm (NCFCA) that can effectively identify opposite expression tendency between two gene groups in gene expression data. After applying it to expression data of cell cycle-regulated genes in yeast, we found that six minichromosome maintenance family genes showed the opposite changing tendency with eight core histone family genes. Furthermore, we confirmed that the negative correlation expression pattern between these two families may be conserved in the cell cycle. Finally, we discussed the reasons underlying the negative correlation of six minichromosome maintenance (MCM) family genes with eight core histone family genes. Our results revealed that negative correlation is an important and potential mechanism that maintains the balance of biological systems by repressing some genes while inducing others. It can thus provide new understanding of gene expression and regulation, the causes of diseases, etc.
Collapse
|
18
|
Sun M, Mi P, North C, Ramakrishnan N. BiSet: Semantic Edge Bundling with Biclusters for Sensemaking. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:310-319. [PMID: 26529710 DOI: 10.1109/tvcg.2015.2467813] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Identifying coordinated relationships is an important task in data analytics. For example, an intelligence analyst might want to discover three suspicious people who all visited the same four cities. Existing techniques that display individual relationships, such as between lists of entities, require repetitious manual selection and significant mental aggregation in cluttered visualizations to find coordinated relationships. In this paper, we present BiSet, a visual analytics technique to support interactive exploration of coordinated relationships. In BiSet, we model coordinated relationships as biclusters and algorithmically mine them from a dataset. Then, we visualize the biclusters in context as bundled edges between sets of related entities. Thus, bundles enable analysts to infer task-oriented semantic insights about potentially coordinated activities. We make bundles as first class objects and add a new layer, "in-between", to contain these bundle objects. Based on this, bundles serve to organize entities represented in lists and visually reveal their membership. Users can interact with edge bundles to organize related entities, and vice versa, for sensemaking purposes. With a usage scenario, we demonstrate how BiSet supports the exploration of coordinated relationships in text analytics.
Collapse
|
19
|
Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data. INT J MACH LEARN CYB 2015. [DOI: 10.1007/s13042-015-0464-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
20
|
Pavlopoulos GA, Malliarakis D, Papanikolaou N, Theodosiou T, Enright AJ, Iliopoulos I. Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future. Gigascience 2015; 4:38. [PMID: 26309733 PMCID: PMC4548842 DOI: 10.1186/s13742-015-0077-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 08/03/2015] [Indexed: 01/31/2023] Open
Abstract
"Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | | | - Nikolas Papanikolaou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Theodosis Theodosiou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Anton J Enright
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD UK
| | - Ioannis Iliopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| |
Collapse
|
21
|
Sun M, North C, Ramakrishnan N. A Five-Level Design Framework for Bicluster Visualizations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:1713-1722. [PMID: 26356885 DOI: 10.1109/tvcg.2014.2346665] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Analysts often need to explore and identify coordinated relationships (e.g., four people who visited the same five cities on the same set of days) within some large datasets for sensemaking. Biclusters provide a potential solution to ease this process, because each computed bicluster bundles individual relationships into coordinated sets. By understanding such computed, structural, relations within biclusters, analysts can leverage their domain knowledge and intuition to determine the importance and relevance of the extracted relationships for making hypotheses. However, due to the lack of systematic design guidelines, it is still a challenge to design effective and usable visualizations of biclusters to enhance their perceptibility and interactivity for exploring coordinated relationships. In this paper, we present a five-level design framework for bicluster visualizations, with a survey of the state-of-the-art design considerations and applications that are related or that can be applied to bicluster visualizations. We summarize pros and cons of these design options to support user tasks at each of the five-level relationships. Finally, we discuss future research challenges for bicluster visualizations and their incorporation into visual analytics tools.
Collapse
|
22
|
Gonçalves JP, Madeira SC. LateBiclustering: Efficient Heuristic Algorithm for Time-Lagged Bicluster Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:801-813. [PMID: 26356854 DOI: 10.1109/tcbb.2014.2312007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Identifying patterns in temporal data is key to uncover meaningful relationships in diverse domains, from stock trading to social interactions. Also of great interest are clinical and biological applications, namely monitoring patient response to treatment or characterizing activity at the molecular level. In biology, researchers seek to gain insight into gene functions and dynamics of biological processes, as well as potential perturbations of these leading to disease, through the study of patterns emerging from gene expression time series. Clustering can group genes exhibiting similar expression profiles, but focuses on global patterns denoting rather broad, unspecific responses. Biclustering reveals local patterns, which more naturally capture the intricate collaboration between biological players, particularly under a temporal setting. Despite the general biclustering formulation being NP-hard, considering specific properties of time series has led to efficient solutions for the discovery of temporally aligned patterns. Notably, the identification of biclusters with time-lagged patterns, suggestive of transcriptional cascades, remains a challenge due to the combinatorial explosion of delayed occurrences. Herein, we propose LateBiclustering, a sensible heuristic algorithm enabling a polynomial rather than exponential time solution for the problem. We show that it identifies meaningful time-lagged biclusters relevant to the response of Saccharomyces cerevisiae to heat stress.
Collapse
|
23
|
Sun P, Speicher NK, Röttger R, Guo J, Baumbach J. Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering. Nucleic Acids Res 2014; 42:e78. [PMID: 24682815 PMCID: PMC5769343 DOI: 10.1093/nar/gku201] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de.
Collapse
Affiliation(s)
- Peng Sun
- Max Planck Institute for Informatics, Campus E1 4, Saarland University, 66123 Saarbrücken, Germany Cluster of Excellence for Multimodel Computing and Interaction, Campus E1 7, Saarland University, 66123 Saarbrücken, Germany
| | - Nora K Speicher
- Max Planck Institute for Informatics, Campus E1 4, Saarland University, 66123 Saarbrücken, Germany Cluster of Excellence for Multimodel Computing and Interaction, Campus E1 7, Saarland University, 66123 Saarbrücken, Germany
| | - Richard Röttger
- Max Planck Institute for Informatics, Campus E1 4, Saarland University, 66123 Saarbrücken, Germany Cluster of Excellence for Multimodel Computing and Interaction, Campus E1 7, Saarland University, 66123 Saarbrücken, Germany
| | - Jiong Guo
- Cluster of Excellence for Multimodel Computing and Interaction, Campus E1 7, Saarland University, 66123 Saarbrücken, Germany
| | - Jan Baumbach
- Max Planck Institute for Informatics, Campus E1 4, Saarland University, 66123 Saarbrücken, Germany Institute for Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark
| |
Collapse
|
24
|
Wang YK, Print CG, Crampin EJ. Biclustering reveals breast cancer tumour subgroups with common clinical features and improves prediction of disease recurrence. BMC Genomics 2013; 14:102. [PMID: 23405961 PMCID: PMC3598775 DOI: 10.1186/1471-2164-14-102] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Accepted: 02/05/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many studies have revealed correlations between breast tumour phenotypes, variations in gene expression, and patient survival outcomes. The molecular heterogeneity between breast tumours revealed by these studies has allowed prediction of prognosis and has underpinned stratified therapy, where groups of patients with particular tumour types receive specific treatments. The molecular tests used to predict prognosis and stratify treatment usually utilise fixed sets of genomic biomarkers, with the same biomarker sets being used to test all patients. In this paper we suggest that instead of fixed sets of genomic biomarkers, it may be more effective to use a stratified biomarker approach, where optimal biomarker sets are automatically chosen for particular patient groups, analogous to the choice of optimal treatments for groups of similar patients in stratified therapy. We illustrate the effectiveness of a biclustering approach to select optimal gene sets for determining the prognosis of specific strata of patients, based on potentially overlapping, non-discrete molecular characteristics of tumours. RESULTS Biclustering identified tightly co-expressed gene sets in the tumours of restricted subgroups of breast cancer patients. The co-expressed genes in these biclusters were significantly enriched for particular biological annotations and gene regulatory modules associated with breast cancer biology. Tumours identified within the same bicluster were more likely to present with similar clinical features. Bicluster membership combined with clinical information could predict patient prognosis in conditional inference tree and ridge regression class prediction models. CONCLUSIONS The increasing clinical use of genomic profiling demands identification of more effective methods to segregate patients into prognostic and treatment groups. We have shown that biclustering can be used to select optimal gene sets for determining the prognosis of specific strata of patients.
Collapse
Affiliation(s)
- Yi Kan Wang
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Cristin G Print
- Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
- New Zealand Bioinformatics Institute, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Auckland, New Zealand
| | - Edmund J Crampin
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Auckland, New Zealand
- Department of Engineering Science, University of Auckland, Auckland, New Zealand
- Melbourne School of Engineering, University of Melbourne, Victoria, Australia
| |
Collapse
|
25
|
Zhang L, Berleant D, Wang Y, Li L, Cook D, Wurtele ES. BirdsEyeView (BEV): graphical overviews of experimental data. BMC Bioinformatics 2012; 13 Suppl 15:S11. [PMID: 23046276 PMCID: PMC3439726 DOI: 10.1186/1471-2105-13-s15-s11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Analyzing global experimental data can be tedious and time-consuming. Thus, helping biologists see results as quickly and easily as possible can facilitate biological research, and is the purpose of the software we describe. Results We present BirdsEyeView, a software system for visualizing experimental transcriptomic data using different views that users can switch among and compare. BirdsEyeView graphically maps data to three views: Cellular Map (currently a plant cell), Pathway Tree with dynamic mapping, and Gene Ontology http://www.geneontology.org Biological Processes and Molecular Functions. By displaying color-coded values for transcript levels across different views, BirdsEyeView can assist users in developing hypotheses about their experiment results. Conclusions BirdsEyeView is a software system available as a Java Webstart package for visualizing transcriptomic data in the context of different biological views to assist biologists in investigating experimental results. BirdsEyeView can be obtained from http://metnetdb.org/MetNet_BirdsEyeView.htm.
Collapse
Affiliation(s)
- Lifeng Zhang
- Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa 50011, USA
| | | | | | | | | | | |
Collapse
|
26
|
|
27
|
Regulatory Snapshots: integrative mining of regulatory modules from expression time series and regulatory networks. PLoS One 2012; 7:e35977. [PMID: 22563474 PMCID: PMC3341384 DOI: 10.1371/journal.pone.0035977] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Accepted: 03/24/2012] [Indexed: 12/15/2022] Open
Abstract
Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules) under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1) apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2) ignore local patterns, abundant in most interesting cases of transcriptional activity; (3) neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4) limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots). Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in functionally enriched processes related to these biological events were identified. Ranking scores further suggested ability to discern the primary role of a gene (target or regulator). Prototype is available at: http://kdbio.inesc-id.pt/software/regulatorysnapshots.
Collapse
|
28
|
Identification and characterization of genes related to the development of breast muscles in Pekin duck. Mol Biol Rep 2012; 39:7647-55. [PMID: 22451153 DOI: 10.1007/s11033-012-1599-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Accepted: 01/31/2012] [Indexed: 02/05/2023]
Abstract
Pekin Duck is world-famous for its fast growth, but its breast muscle development is later and breast muscle content is lower compared with other muscular ducks. Therefore, it is very important to discover the genetic mechanism between breast muscle development and relative gene expression in Pekin duck. In current study, the genes which have relationships with breast muscle development were identified by suppression subtractive hybridization. A total of 403 positive clones were sequenced and 257 unigenes were obtained. The expression of 23 genes were analyzed in the breast muscle of 2-, 4-, 6-, 8- week old Pekin ducks. The results showed that unknown clone A233, C83 and C99 showed descending tendency as age increased; KBTBD10, HSPA8, MYL1, ZFP622, MARCH4, Nexilin, FABP4 and MUSTN1 had high expression levels at 6 weeks old; WAC, NT5C3, HSP90AA1, MRPL33, KLF6, TSNAX, CDC42EP3, HSPA4, TRAK1, NR2F2, HAUS1 and IGF1 had high expression levels at 8 weeks and showed ascending tendency as age increased. Expression of these 23 genes were also analyzed in breast muscle, leg muscle, heart, kidney, liver, muscular stomach and sebum cutaneum in 4-8-week old Pekin duck and results showed that most of these genes had high expression in breast muscle, leg muscle and heart.
Collapse
|
29
|
Zhou F, Ma Q, Li G, Xu Y. QServer: a biclustering server for prediction and assessment of co-expressed gene clusters. PLoS One 2012; 7:e32660. [PMID: 22403692 PMCID: PMC3293860 DOI: 10.1371/journal.pone.0032660] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Accepted: 01/30/2012] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Biclustering is a powerful technique for identification of co-expressed gene groups under any (unspecified) substantial subset of given experimental conditions, which can be used for elucidation of transcriptionally co-regulated genes. RESULTS We have previously developed a biclustering algorithm, QUBIC, which can solve more general biclustering problems than previous biclustering algorithms. To fully utilize the analysis power the algorithm provides, we have developed a web server, QServer, for prediction, computational validation and analyses of co-expressed gene clusters. Specifically, the QServer has the following capabilities in addition to biclustering by QUBIC: (i) prediction and assessment of conserved cis regulatory motifs in promoter sequences of the predicted co-expressed genes; (ii) functional enrichment analyses of the predicted co-expressed gene clusters using Gene Ontology (GO) terms, and (iii) visualization capabilities in support of interactive biclustering analyses. QServer supports the biclustering and functional analysis for a wide range of organisms, including human, mouse, Arabidopsis, bacteria and archaea, whose underlying genome database will be continuously updated. CONCLUSION We believe that QServer provides an easy-to-use and highly effective platform useful for hypothesis formulation and testing related to transcription co-regulation.
Collapse
Affiliation(s)
- Fengfeng Zhou
- Research Center for Biomedical Information Technology, Institute of Biomedical and Health Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, People's Republic of China
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, BioEnergy Science Center (BESC), University of Georgia, Athens, Georgia, United States of America
| | - Qin Ma
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, BioEnergy Science Center (BESC), University of Georgia, Athens, Georgia, United States of America
| | - Guojun Li
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, BioEnergy Science Center (BESC), University of Georgia, Athens, Georgia, United States of America
- School of Mathematics, Shandong University, Jinan, China
| | - Ying Xu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, BioEnergy Science Center (BESC), University of Georgia, Athens, Georgia, United States of America
- College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
30
|
Castro-Melchor M, Le H, Hu WS. Transcriptome data analysis for cell culture processes. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2012; 127:27-70. [PMID: 22194060 DOI: 10.1007/10_2011_116] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In the past decade, DNA microarrays have fundamentally changed the way we study complex biological systems. By measuring the expression levels of thousands of transcripts, the paradigm of studying organisms has shifted from focusing on the local phenomena of a few genes to surveying the whole genome. DNA microarrays are used in a variety of ways, from simple comparisons between two samples to more intricate time-series studies. With the large number of genes being studied, the dimensionality of the problem is inevitably high. The analysis of microarray data thus requires specific approaches. In the case of time-series microarray studies, data analysis is further complicated by the correlation between successive time points in a series.In this review, we survey the methodologies used in the analysis of static and time-series microarray data, covering data pre-processing, identification of differentially expressed genes, profile pattern recognition, pathway analysis, and network reconstruction. When available, examples of their use in mammalian cell cultures are presented.
Collapse
|
31
|
Michaelson JJ, Trump S, Rudzok S, Gräbsch C, Madureira DJ, Dautel F, Mai J, Attinger S, Schirmer K, von Bergen M, Lehmann I, Beyer A. Transcriptional signatures of regulatory and toxic responses to benzo-[a]-pyrene exposure. BMC Genomics 2011; 12:502. [PMID: 21995607 PMCID: PMC3215681 DOI: 10.1186/1471-2164-12-502] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 10/13/2011] [Indexed: 01/01/2023] Open
Abstract
Background Small molecule ligands often have multiple effects on the transcriptional program of a cell: they trigger a receptor specific response and additional, indirect responses ("side effects"). Distinguishing those responses is important for understanding side effects of drugs and for elucidating molecular mechanisms of toxic chemicals. Results We explored this problem by exposing cells to the environmental contaminant benzo-[a]-pyrene (B[a]P). B[a]P exposure activates the aryl hydrocarbon receptor (Ahr) and causes toxic stress resulting in transcriptional changes that are not regulated through Ahr. We sought to distinguish these two types of responses based on a time course of expression changes measured after B[a]P exposure. Using Random Forest machine learning we classified 81 primary Ahr responders and 1,308 genes regulated as side effects. Subsequent weighted clustering gave further insight into the connection between expression pattern, mode of regulation, and biological function. Finally, the accuracy of the predictions was supported through extensive experimental validation. Conclusion Using a combination of machine learning followed by extensive experimental validation, we have further expanded the known catalog of genes regulated by the environmentally sensitive transcription factor Ahr. More broadly, this study presents a strategy for distinguishing receptor-dependent responses and side effects based on expression time courses.
Collapse
Affiliation(s)
- Jacob J Michaelson
- Cellular Networks and Systems Biology, Biotechnology Center, TU Dresden, Dresden, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS. Biclustering of gene expression data by correlation-based scatter search. BioData Min 2011; 4:3. [PMID: 21261986 PMCID: PMC3037342 DOI: 10.1186/1756-0381-4-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 01/24/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes. METHODS Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes. RESULTS The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- Dpt. Lenguajes y Sistemas Informáticos, ETSII, University of Seville, Avd. Reina Mercedes s/n, 41012, Seville, Spain
| | - Alicia Troncoso
- Department of Computer Science, School of Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013, Seville, Spain
| | - Jesús S Aguilar-Ruiz
- Department of Computer Science, School of Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013, Seville, Spain
| |
Collapse
|
33
|
Hollunder J, Friedel M, Kuiper M, Wilhelm T. DASS-GUI: a user interface for identification and analysis of significant patterns in non-sequential data. Bioinformatics 2010; 26:987-9. [PMID: 20172945 PMCID: PMC2844999 DOI: 10.1093/bioinformatics/btq071] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Revised: 02/12/2010] [Accepted: 02/17/2010] [Indexed: 11/29/2022] Open
Abstract
SUMMARY Many large 'omics' datasets have been published and many more are expected in the near future. New analysis methods are needed for best exploitation. We have developed a graphical user interface (GUI) for easy data analysis. Our discovery of all significant substructures (DASS) approach elucidates the underlying modularity, a typical feature of complex biological data. It is related to biclustering and other data mining approaches. Importantly, DASS-GUI also allows handling of multi-sets and calculation of statistical significances. DASS-GUI contains tools for further analysis of the identified patterns: analysis of the pattern hierarchy, enrichment analysis, module validation, analysis of additional numerical data, easy handling of synonymous names, clustering, filtering and merging. Different export options allow easy usage of additional tools such as Cytoscape. AVAILABILITY Source code, pre-compiled binaries for different systems, a comprehensive tutorial, case studies and many additional datasets are freely available at http://www.ifr.ac.uk/dass/gui/. DASS-GUI is implemented in Qt.
Collapse
Affiliation(s)
- Jens Hollunder
- Department of Plant Systems Biology, VIB, Department of Molecular Genetics, Ghent University, Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | |
Collapse
|
34
|
Gehlenborg N, O'Donoghue SI, Baliga NS, Goesmann A, Hibbs MA, Kitano H, Kohlbacher O, Neuweger H, Schneider R, Tenenbaum D, Gavin AC. Visualization of omics data for systems biology. Nat Methods 2010; 7:S56-68. [DOI: 10.1038/nmeth.1436] [Citation(s) in RCA: 474] [Impact Index Per Article: 33.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
35
|
Zeng T, Li J. Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways. Nucleic Acids Res 2009; 38:e1. [PMID: 19854949 PMCID: PMC2800212 DOI: 10.1093/nar/gkp822] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Positive correlation can be diversely instantiated as shifting, scaling or geometric pattern, and it has been extensively explored for time-course gene expression data and pathway analysis. Recently, biological studies emerge a trend focusing on the notion of negative correlations such as opposite expression patterns, complementary patterns and self-negative regulation of transcription factors (TFs). These biological ideas and primitive observations motivate us to formulate and investigate the problem of maximizing negative correlations. The objective is to discover all maximal negative correlations of statistical and biological significance from time-course gene expression data for enhancing our understanding of molecular pathways. Given a gene expression matrix, a maximal negative correlation is defined as an activation–inhibition two-way expression pattern (AIE pattern). We propose a parameter-free algorithm to enumerate the complete set of AIE patterns from a data set. This algorithm can identify significant negative correlations that cannot be identified by the traditional clustering/biclustering methods. To demonstrate the biological usefulness of AIE patterns in the analysis of molecular pathways, we conducted deep case studies for AIE patterns identified from Yeast cell cycle data sets. In particular, in the analysis of the Lysine biosynthesis pathway, new regulation modules and pathway components were inferred according to a significant negative correlation which is likely caused by a co-regulation of the TFs at the higher layer of the biological network. We conjecture that maximal negative correlations between genes are actually a common characteristic in molecular pathways, which can provide insights into the cell stress response study, drug response evaluation, etc.
Collapse
Affiliation(s)
- Tao Zeng
- School of Computer Engineering & Bioinformatics Research Center, Nanyang Technological University, Singapore
| | | |
Collapse
|