1
|
Schmidlin K, Ogbunugafor CB, Geiler-Samerotte K. Environment by environment interactions (ExE) differ across genetic backgrounds (ExExG). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.08.593194. [PMID: 38766025 PMCID: PMC11100745 DOI: 10.1101/2024.05.08.593194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
While the terms "gene-by-gene interaction" (GxG) and "gene-by-environment interaction" (GxE) are commonplace within the field of quantitative and evolutionary genetics, "environment-by-environment interaction" (ExE) is a term used less often. However, in this study, we find that environment-by-environment interactions are common and differ for different genotypes (ExExG). To reach this conclusion, we analyzed a large dataset of roughly 1,000 mutant yeast strains with varying degrees of resistance to different antifungal drugs. Many researchers endeavor to predict combinations of drugs that are more lethal than either single drug. But we show that the effectiveness of a drug combination, relative to the effectiveness of single drugs, often varies across different drug resistant mutants. Even mutants that differ by only a single nucleotide change can have dramatically different drug x drug (ExE) interactions. Studying how ExE interactions change across genotypes (ExExG) is not only important when modeling the evolution of pathogenic microbes. High throughput screens of GxG and GxE have taught us about the basic cell biology and gene regulatory networks underlying genetic interactions. ExExG has been omitted but stands to impart similar lessons about the architecture of living systems. In this study, we call attention to ExExG, measure its prevalence, introduce a new framework that in some instances better predicts its direction and magnitude, and make the case for further study of this type of genetic interaction.
Collapse
Affiliation(s)
- Kara Schmidlin
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ, 85287
- School of Life Sciences, Arizona State University, Tempe AZ, 85287
| | - C. Brandon Ogbunugafor
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT,06511
- Santa Fe Institute, Santa Fe, NM, 87501
| | - Kerry Geiler-Samerotte
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ, 85287
- School of Life Sciences, Arizona State University, Tempe AZ, 85287
| |
Collapse
|
2
|
Zhang G, Gao Z, Yan C, Wang J, Liang W, Luo J, Luo H. KGANSynergy: knowledge graph attention network for drug synergy prediction. Brief Bioinform 2023; 24:7147878. [PMID: 37130580 DOI: 10.1093/bib/bbad167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 03/10/2023] [Accepted: 04/03/2023] [Indexed: 05/04/2023] Open
Abstract
Combination therapy is widely used to treat complex diseases, particularly in patients who respond poorly to monotherapy. For example, compared with the use of a single drug, drug combinations can reduce drug resistance and improve the efficacy of cancer treatment. Thus, it is vital for researchers and society to help develop effective combination therapies through clinical trials. However, high-throughput synergistic drug combination screening remains challenging and expensive in the large combinational space, where an array of compounds are used. To solve this problem, various computational approaches have been proposed to effectively identify drug combinations by utilizing drug-related biomedical information. In this study, considering the implications of various types of neighbor information of drug entities, we propose a novel end-to-end Knowledge Graph Attention Network to predict drug synergy (KGANSynergy), which utilizes neighbor information of known drugs/cell lines effectively. KGANSynergy uses knowledge graph (KG) hierarchical propagation to find multi-source neighbor nodes for drugs and cell lines. The knowledge graph attention network is designed to distinguish the importance of neighbors in a KG through a multi-attention mechanism and then aggregate the entity's neighbor node information to enrich the entity. Finally, the learned drug and cell line embeddings can be utilized to predict the synergy of drug combinations. Experiments demonstrated that our method outperformed several other competing methods, indicating that our method is effective in identifying drug combinations.
Collapse
Affiliation(s)
- Ge Zhang
- School of Computer and Information Engineering, Henan University, Jinming Street, 475004 Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Jinming Street, 475004 Kaifeng, China
| | - Zhijie Gao
- School of Computer and Information Engineering, Henan University, Jinming Street, 475004 Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Jinming Street, 475004 Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Jinming Street, 475004 Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Jinming Street, 475004 Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Jinming Street, 475004 Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Jinming Street, 475004 Kaifeng, China
| | - Wenjuan Liang
- School of Computer and Information Engineering, Henan University, Jinming Street, 475004 Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Jinming Street, 475004 Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Shiji Street, 454003 Jiaozuo, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Jinming Street, 475004 Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Jinming Street, 475004 Kaifeng, China
| |
Collapse
|
3
|
Manipur I, Giordano M, Piccirillo M, Parashuraman S, Maddalena L. Community Detection in Protein-Protein Interaction Networks and Applications. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:217-237. [PMID: 34951849 DOI: 10.1109/tcbb.2021.3138142] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The ability to identify and characterize not only the protein-protein interactions but also their internal modular organization through network analysis is fundamental for understanding the mechanisms of biological processes at the molecular level. Indeed, the detection of the network communities can enhance our understanding of the molecular basis of disease pathology, and promote drug discovery and disease treatment in personalized medicine. This work gives an overview of recent computational methods for the detection of protein complexes and functional modules in protein-protein interaction networks, also providing a focus on some of its applications. We propose a systematic reformulation of frequently adopted taxonomies for these methods, also proposing new categories to keep up with the most recent research. We review the literature of the last five years (2017-2021) and provide links to existing data and software resources. Finally, we survey recent works exploiting module identification and analysis, in the context of a variety of disease processes for biomarker identification and therapeutic target detection. Our review provides the interested reader with an up-to-date and self-contained view of the existing research, with links to state-of-the-art literature and resources, as well as hints on open issues and future research directions in complex detection and its applications.
Collapse
|
4
|
Cano R, Lenz AR, Galan-Vasquez E, Ramirez-Prado JH, Perez-Rueda E. Gene Regulatory Network Inference and Gene Module Regulating Virulence in Fusarium oxysporum. Front Microbiol 2022; 13:861528. [PMID: 35722316 PMCID: PMC9201490 DOI: 10.3389/fmicb.2022.861528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/09/2022] [Indexed: 11/20/2022] Open
Abstract
In this work, we inferred the gene regulatory network (GRN) of the fungus Fusarium oxysporum by using the regulatory networks of Aspergillus nidulans FGSC A4, Neurospora crassa OR74A, Saccharomyces cerevisiae S288c, and Fusarium graminearum PH-1 as templates for sequence comparisons. Topological properties to infer the role of transcription factors (TFs) and to identify functional modules were calculated in the GRN. From these analyzes, five TFs were identified as hubs, including FOXG_04688 and FOXG_05432, which regulate 2,404 and 1,864 target genes, respectively. In addition, 16 communities were identified in the GRN, where the largest contains 1,923 genes and the smallest contains 227 genes. Finally, the genes associated with virulence were extracted from the GRN and exhaustively analyzed, and we identified a giant module with ten TFs and 273 target genes, where the most highly connected node corresponds to the transcription factor FOXG_05265, homologous to the putative bZip transcription factor CPTF1 of Claviceps purpurea, which is involved in ergotism disease that affects cereal crops and grasses. The results described in this work can be used for the study of gene regulation in this organism and open the possibility to explore putative genes associated with virulence against their host.
Collapse
Affiliation(s)
- Regnier Cano
- Centro de Investigaciones Científicas de Yucatán, Mérida, Mexico
| | - Alexandre Rafael Lenz
- Departamento de Ciências Exatas e da Terra, Universidade do Estado da Bahia, Salvador, Brazil
| | - Edgardo Galan-Vasquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico, Mexico
| | | | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Unidad Académica Yucatán Universidad Nacional Autónoma de México, Mérida, Mexico
| |
Collapse
|
5
|
Wang P, Moore BM, Uygun S, Lehti-Shiu MD, Barry CS, Shiu SH. Optimising the use of gene expression data to predict plant metabolic pathway memberships. THE NEW PHYTOLOGIST 2021; 231:475-489. [PMID: 33749860 DOI: 10.1111/nph.17355] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/13/2021] [Indexed: 06/12/2023]
Abstract
Plant metabolites from diverse pathways are important for plant survival, human nutrition and medicine. The pathway memberships of most plant enzyme genes are unknown. While co-expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts. Utilising > 600 tomato (Solanum lycopersicum) expression data combinations, three strategies for predicting memberships in 85 pathways were explored. Optimal predictions for different pathways require distinct data combinations indicative of pathway functions. Naive prediction (i.e. identifying pathways with the most similarly expressed genes) is error prone. In 52 pathways, unsupervised learning performed better than supervised approaches, possibly due to limited training data availability. Using gene-to-pathway expression similarities led to prediction models that outperformed those based simply on expression levels. Using 36 experimental validated genes, the pathway-best model prediction accuracy is 58.3%, significantly better compared with that for predicting annotated genes without experimental evidence (37.0%) or random guess (1.2%), demonstrating the importance of data quality. Our study highlights the need to extensively explore expression-based features and prediction strategies to maximise the accuracy of metabolic pathway membership assignment. The prediction framework outlined here can be applied to other species and serves as a baseline model for future comparisons.
Collapse
Affiliation(s)
- Peipei Wang
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Bethany M Moore
- Department of Botany, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | | - Melissa D Lehti-Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Cornelius S Barry
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
6
|
Tine M, Kuhl H, Teske PR, Reinhardt R. Genome-wide analysis of European sea bass provides insights into the evolution and functions of single-exon genes. Ecol Evol 2021; 11:6546-6557. [PMID: 34141239 PMCID: PMC8207432 DOI: 10.1002/ece3.7507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 01/24/2021] [Accepted: 03/12/2021] [Indexed: 11/17/2022] Open
Abstract
Several studies have attempted to understand the origin and evolution of single-exon genes (SEGs) in eukaryotic organisms, including fishes, but few have examined the functional and evolutionary relationships between SEGs and multiple-exon gene (MEG) paralogs, in particular the conservation of promoter regions. Given that SEGs originate via the reverse transcription of mRNA from a "parental" MEGs, such comparisons may enable identifying evolutionarily-related SEG/MEG paralogs, which might fulfill equivalent physiological functions. Here, the relationship of SEG proportion with MEG count, gene density, intron count, and chromosome size was assessed for the genome of the European sea bass, Dicentrarchus labrax. Then, SEGs with an MEG parent were identified, and promoter sequences of SEG/MEG paralogs were compared, to identify highly conserved functional motifs. The results revealed a total count of 1,585 (8.3% of total genes) SEGs in the European sea bass genome, which was correlated with MEG count but not with gene density. The significant correlation of SEG content with the number of MEGs suggests that SEGs were continuously and independently generated over evolutionary time following species divergence through retrotranscription events, followed by tandem duplications. Functional annotation showed that the majority of SEGs are functional, as is evident from their expression in RNA-seq data used to support homology-based genome annotation. Differences in 5'UTR and 3'UTR lengths between SEG/MEG paralogs observed in this study may contribute to gene expression divergence between them and therefore lead to the emergence of new SEG functions. The comparison of nonsynonymous to synonymous changes (Ka/Ks) between SEG/MEG parents showed that 74 of them are under positive selection (Ka/Ks > 1; p = .0447). An additional fifteen SEGs with an MEG parent have a common promoter, which implies that they are under the influence of common regulatory networks.
Collapse
Affiliation(s)
- Mbaye Tine
- UFR des Sciences Agronomiques, de l'Aquaculture et des Technologies Alimentaires (S2ATA)Université Gaston Berger (UGB)Saint‐LouisSenegal
- Genome Centre at the Max‐Planck Institute for Plant Breeding ResearchKölnGermany
| | - Heiner Kuhl
- Department of Ecophysiology and AquacultureLeibniz‐Institute of Freshwater Ecology and Inland Fisheries (IGB)BerlinGermany
| | - Peter R. Teske
- Department of ZoologyCentre for Ecological Genomics and Wildlife ConservationUniversity of JohannesburgJohannesburgSouth Africa
| | - Richard Reinhardt
- Genome Centre at the Max‐Planck Institute for Plant Breeding ResearchKölnGermany
| |
Collapse
|
7
|
Single-cell transcriptional networks in differentiating preadipocytes suggest drivers associated with tissue heterogeneity. Nat Commun 2020; 11:2117. [PMID: 32355218 PMCID: PMC7192917 DOI: 10.1038/s41467-020-16019-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 04/03/2020] [Indexed: 12/14/2022] Open
Abstract
White adipose tissue plays an important role in physiological homeostasis and metabolic disease. Different fat depots have distinct metabolic and inflammatory profiles and are differentially associated with disease risk. It is unclear whether these differences are intrinsic to the pre-differentiated stage. Using single-cell RNA sequencing, a unique network methodology and a data integration technique, we predict metabolic phenotypes in differentiating cells. Single-cell RNA-seq profiles of human preadipocytes during adipogenesis in vitro identifies at least two distinct classes of subcutaneous white adipocytes. These differences in gene expression are separate from the process of browning and beiging. Using a systems biology approach, we identify a new network of zinc-finger proteins that are expressed in one class of preadipocytes and is potentially involved in regulating adipogenesis. Our findings gain a deeper understanding of both the heterogeneity of white adipocytes and their link to normal metabolism and disease. The origin of the heterogeneity of metabolic and inflammatory profiles exhibited by white adipocytes is little understood. Here, using scRNA-seq and computational methods, the authors show that differentiating preadipocytes exhibit gene expression differences and suggest underlying regulators.
Collapse
|
8
|
Pusa T, Ferrarini MG, Andrade R, Mary A, Marchetti-Spaccamela A, Stougie L, Sagot MF. MOOMIN - Mathematical explOration of 'Omics data on a MetabolIc Network. Bioinformatics 2019; 36:514-523. [PMID: 31504164 PMCID: PMC9883724 DOI: 10.1093/bioinformatics/btz584] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 07/16/2019] [Accepted: 08/19/2019] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Analysis of differential expression of genes is often performed to understand how the metabolic activity of an organism is impacted by a perturbation. However, because the system of metabolic regulation is complex and all changes are not directly reflected in the expression levels, interpreting these data can be difficult. RESULTS In this work, we present a new algorithm and computational tool that uses a genome-scale metabolic reconstruction to infer metabolic changes from differential expression data. Using the framework of constraint-based analysis, our method produces a qualitative hypothesis of a change in metabolic activity. In other words, each reaction of the network is inferred to have increased, decreased, or remained unchanged in flux. In contrast to similar previous approaches, our method does not require a biological objective function and does not assign on/off activity states to genes. An implementation is provided and it is available online. We apply the method to three published datasets to show that it successfully accomplishes its two main goals: confirming or rejecting metabolic changes suggested by differentially expressed genes based on how well they fit in as parts of a coordinated metabolic change, as well as inferring changes in reactions whose genes did not undergo differential expression. AVAILABILITY AND IMPLEMENTATION github.com/htpusa/moomin. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Taneli Pusa
- To whom correspondence should be addressed. or
| | - Mariana Galvão Ferrarini
- Laboratoire de Biométrie et Biologie Évolutive, UMR 5558, CNRS, Université de Lyon, Université Lyon 1, Villeurbanne 69622, France,Univ Lyon, INSA-Lyon, INRA, BF2i, UMR0203, F-69621, Villeurbanne 69622, France
| | - Ricardo Andrade
- INRIA Grenoble Rhône-Alpes, Montbonnot-Saint-Martin 38334, France,Laboratoire de Biométrie et Biologie Évolutive, UMR 5558, CNRS, Université de Lyon, Université Lyon 1, Villeurbanne 69622, France
| | - Arnaud Mary
- INRIA Grenoble Rhône-Alpes, Montbonnot-Saint-Martin 38334, France,Laboratoire de Biométrie et Biologie Évolutive, UMR 5558, CNRS, Université de Lyon, Université Lyon 1, Villeurbanne 69622, France
| | | | | | | |
Collapse
|
9
|
de Campos LM, Cano A, Castellano JG, Moral S. Combining gene expression data and prior knowledge for inferring gene regulatory networks via Bayesian networks using structural restrictions. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0042. [PMID: 31042646 DOI: 10.1515/sagmb-2018-0042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Gene Regulatory Networks (GRNs) are known as the most adequate instrument to provide a clear insight and understanding of the cellular systems. One of the most successful techniques to reconstruct GRNs using gene expression data is Bayesian networks (BN) which have proven to be an ideal approach for heterogeneous data integration in the learning process. Nevertheless, the incorporation of prior knowledge has been achieved by using prior beliefs or by using networks as a starting point in the search process. In this work, the utilization of different kinds of structural restrictions within algorithms for learning BNs from gene expression data is considered. These restrictions will codify prior knowledge, in such a way that a BN should satisfy them. Therefore, one aim of this work is to make a detailed review on the use of prior knowledge and gene expression data to inferring GRNs from BNs, but the major purpose in this paper is to research whether the structural learning algorithms for BNs from expression data can achieve better outcomes exploiting this prior knowledge with the use of structural restrictions. In the experimental study, it is shown that this new way to incorporate prior knowledge leads us to achieve better reverse-engineered networks.
Collapse
Affiliation(s)
- Luis M de Campos
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Andrés Cano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Javier G Castellano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Serafín Moral
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|
10
|
Liu W, Rajapakse JC. Fusing gene expressions and transitive protein-protein interactions for inference of gene regulatory networks. BMC SYSTEMS BIOLOGY 2019; 13:37. [PMID: 30953534 PMCID: PMC6449891 DOI: 10.1186/s12918-019-0695-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Background Systematic fusion of multiple data sources for Gene Regulatory Networks (GRN) inference remains a key challenge in systems biology. We incorporate information from protein-protein interaction networks (PPIN) into the process of GRN inference from gene expression (GE) data. However, existing PPIN remain sparse and transitive protein interactions can help predict missing protein interactions. We therefore propose a systematic probabilistic framework on fusing GE data and transitive protein interaction data to coherently build GRN. Results We use a Gaussian Mixture Model (GMM) to soft-cluster GE data, allowing overlapping cluster memberships. Next, a heuristic method is proposed to extend sparse PPIN by incorporating transitive linkages. We then propose a novel way to score extended protein interactions by combining topological properties of PPIN and correlations of GE. Following this, GE data and extended PPIN are fused using a Gaussian Hidden Markov Model (GHMM) in order to identify gene regulatory pathways and refine interaction scores that are then used to constrain the GRN structure. We employ a Bayesian Gaussian Mixture (BGM) model to refine the GRN derived from GE data by using the structural priors derived from GHMM. Experiments on real yeast regulatory networks demonstrate both the feasibility of the extended PPIN in predicting transitive protein interactions and its effectiveness on improving the coverage and accuracy the proposed method of fusing PPIN and GE to build GRN. Conclusion The GE and PPIN fusion model outperforms both the state-of-the-art single data source models (CLR, GENIE3, TIGRESS) as well as existing fusion models under various constraints.
Collapse
Affiliation(s)
- Wenting Liu
- School of Public Health and Management, Hubei University of Medicine, Shiyan, Hubei, China.,Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jagath C Rajapakse
- School of Computer Engineering, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
11
|
Wang ZT, Tan CC, Tan L, Yu JT. Systems biology and gene networks in Alzheimer’s disease. Neurosci Biobehav Rev 2019; 96:31-44. [DOI: 10.1016/j.neubiorev.2018.11.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Revised: 11/18/2018] [Accepted: 11/18/2018] [Indexed: 12/25/2022]
|
12
|
Franks AM, Markowetz F, Airoldi EM. REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. Ann Appl Stat 2018; 12:1361-1384. [PMID: 36506698 PMCID: PMC9733905 DOI: 10.1214/16-aoas915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and, Applied Probability, University of California, Santa Barbara, South Hall, Santa Barbara, California 93106, USA
| | - Florian Markowetz
- Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, United Kingdom
| | - Edoardo M Airoldi
- Fox School of Business, Department of Statistical Science, Temple University, Center for Data Science, 1810 Liacouras Walk, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
13
|
Kikkawa A. Random Matrix Analysis for Gene Interaction Networks in Cancer Cells. Sci Rep 2018; 8:10607. [PMID: 30006574 PMCID: PMC6045654 DOI: 10.1038/s41598-018-28954-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 07/03/2018] [Indexed: 01/12/2023] Open
Abstract
Investigations of topological uniqueness of gene interaction networks in cancer cells are essential for understanding the disease. Although cancer is considered to originate from the topological alteration of a huge molecular interaction network in cellular systems, the theoretical study to investigate such complex networks is still insufficient. It is necessary to predict the behavior of a huge complex interaction network from the behavior of a finite size network. Based on the random matrix theory, we study the distribution of the nearest neighbor level spacings P(s) of interaction matrices of gene networks in human cancer cells. The interaction matrices are computed using the Cancer Network Galaxy (TCNG) database which is a repository of gene interactions inferred by a Bayesian network model. 256 NCBI GEO entries regarding gene expressions in human cancer cells have been used for the inference. We observe the Wigner distribution of P(s) when the gene networks are dense networks that have more than ~38,000 edges. In the opposite case, when the networks have smaller numbers of edges, the distribution P(s) becomes the Poisson distribution. We investigate relevance of P(s) both to the sparseness of the networks and to edge frequency factor which is the reliance (likelihood) of the inferred gene interactions.
Collapse
Affiliation(s)
- Ayumi Kikkawa
- Mathematical and Theoretical Physics Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan.
| |
Collapse
|
14
|
Zhang Z, Song J, Tang J, Xu X, Guo F. Detecting complexes from edge-weighted PPI networks via genes expression analysis. BMC SYSTEMS BIOLOGY 2018; 12:40. [PMID: 29745859 PMCID: PMC5998908 DOI: 10.1186/s12918-018-0565-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
BACKGROUND Identifying complexes from PPI networks has become a key problem to elucidate protein functions and identify signal and biological processes in a cell. Proteins binding as complexes are important roles of life activity. Accurate determination of complexes in PPI networks is crucial for understanding principles of cellular organization. RESULTS We propose a novel method to identify complexes on PPI networks, based on different co-expression information. First, we use Markov Cluster Algorithm with an edge-weighting scheme to calculate complexes on PPI networks. Then, we propose some significant features, such as graph information and gene expression analysis, to filter and modify complexes predicted by Markov Cluster Algorithm. To evaluate our method, we test on two experimental yeast PPI networks. CONCLUSIONS On DIP network, our method has Precision and F-Measure values of 0.6004 and 0.5528. On MIPS network, our method has F-Measure and S n values of 0.3774 and 0.3453. Comparing to existing methods, our method improves Precision value by at least 0.1752, F-Measure value by at least 0.0448, S n value by at least 0.0771. Experiments show that our method achieves better results than some state-of-the-art methods for identifying complexes on PPI networks, with the prediction quality improved in terms of evaluation criteria.
Collapse
Affiliation(s)
- Zehua Zhang
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
| | - Jian Song
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, People’s Republic of China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, USA
| | - Xinying Xu
- School of Information Engineering, Taiyuan University of Technology, Taiyuan, People’s Republic of China
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
| |
Collapse
|
15
|
Mall R, Cerulo L, Garofano L, Frattini V, Kunji K, Bensmail H, Sabedot TS, Noushmehr H, Lasorella A, Iavarone A, Ceccarelli M. RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes. Nucleic Acids Res 2018; 46:e39. [PMID: 29361062 PMCID: PMC6283452 DOI: 10.1093/nar/gky015] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 01/06/2018] [Indexed: 01/05/2023] Open
Abstract
We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.
Collapse
Affiliation(s)
- Raghvendra Mall
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Luigi Cerulo
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| | - Luciano Garofano
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| | - Veronique Frattini
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
| | - Khalid Kunji
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Thais S Sabedot
- Department of Neurosurgery, Brain Tumor Center, Henry Ford Health System, Detroit, MI, USA
- Department of Genetics (CISBi/NAP), Department of Surgery and Anatomy, Ribeirão Preto Medical School, University of Sao Paulo, Monte Alegre, Ribeirao Preto, Brazil
| | - Houtan Noushmehr
- Department of Neurosurgery, Brain Tumor Center, Henry Ford Health System, Detroit, MI, USA
- Department of Genetics (CISBi/NAP), Department of Surgery and Anatomy, Ribeirão Preto Medical School, University of Sao Paulo, Monte Alegre, Ribeirao Preto, Brazil
| | - Anna Lasorella
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, New York 10032, USA
- Department of Pediatrics, Columbia University Medical Center, New York, New York 10032, USA
| | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY 10032, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, New York 10032, USA
- Department of Neurology, Columbia University Medical Center, New York, New York 10032, USA
| | - Michele Ceccarelli
- Department of Science and Technology, University of Sannio, Benevento, Italy
- BIOGEM Istituto di Ricerche Genetiche “G. Salvatore”, Ariano Irpino, Italy
| |
Collapse
|
16
|
Liang Y, Kelemen A. Bayesian state space models for dynamic genetic network construction across multiple tissues. Stat Appl Genet Mol Biol 2017; 15:273-90. [PMID: 27343475 DOI: 10.1515/sagmb-2014-0055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.
Collapse
|
17
|
Choi H, Gim J, Won S, Kim YJ, Kwon S, Park C. Network analysis for count data with excess zeros. BMC Genet 2017; 18:93. [PMID: 29110633 PMCID: PMC5674822 DOI: 10.1186/s12863-017-0561-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Accepted: 10/25/2017] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Undirected graphical models or Markov random fields have been a popular class of models for representing conditional dependence relationships between nodes. In particular, Markov networks help us to understand complex interactions between genes in biological processes of a cell. Local Poisson models seem to be promising in modeling positive as well as negative dependencies for count data. Furthermore, when zero counts are more frequent than are expected, excess zeros should be considered in the model. METHODS We present a penalized Poisson graphical model for zero inflated count data and derive an expectation-maximization (EM) algorithm built on coordinate descent. Our method is shown to be effective through simulated and real data analysis. RESULTS Results from the simulated data indicate that our method outperforms the local Poisson graphical model in the presence of excess zeros. In an application to a RNA sequencing data, we also investigate the gender effect by comparing the estimated networks according to different genders. Our method may help us in identifying biological pathways linked to sex hormone regulation and thus understanding underlying mechanisms of the gender differences. CONCLUSIONS We have presented a penalized version of zero inflated spatial Poisson regression and derive an efficient EM algorithm built on coordinate descent. We discuss possible improvements of our method as well as potential research directions associated with our findings from the RNA sequencing data.
Collapse
Affiliation(s)
- Hosik Choi
- Department of Applied Statistics, Kyonggi University, Suwon, 16227 Korea
| | - Jungsoo Gim
- Institute of Health and Environment, Seoul National University, Seoul, 08826 Korea
| | - Sungho Won
- Graduate School of Public Health, Seoul National University, 08826Seoul, Korea
| | - You Jin Kim
- Department of Nutritional Science and Food Management, Ewha Womans University, Seoul, 03760 Korea
| | - Sunghoon Kwon
- Department of Applied Statistics, Konkuk University, Seoul, 05029 Korea
| | - Changyi Park
- Department of Statistics, University of Seoul, Seoul, 02504 Korea
| |
Collapse
|
18
|
Huang B, Zhang L, Du Y, Xu F, Li L, Zhang G. Characterization of the Mollusc RIG-I/MAVS Pathway Reveals an Archaic Antiviral Signalling Framework in Invertebrates. Sci Rep 2017; 7:8217. [PMID: 28811654 PMCID: PMC5557890 DOI: 10.1038/s41598-017-08566-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 07/11/2017] [Indexed: 12/19/2022] Open
Abstract
Despite the mitochondrial antiviral signalling protein (MAVS)-dependent RIG-I-like receptor (RLR) signalling pathway in the cytosol plays an indispensable role in the antiviral immunity of the host, surprising little is known in invertebrates. Here we characterized the major members of RLR pathway and investigated their signal transduction a Molluscs. We show that genes involved in RLR pathway were significantly induced during virus challenge, including CgRIG-I-1, CgMAVS, CgTRAF6 (TNF receptor-associated factor 6), and CgIRFs (interferon regulatory factors. Similar to human RIG-I, oyster RIG-I-1 could bind poly(I:C) directly in vitro and interact with oyster MAVS via its caspase activation and recruitment domains. We also show that transmembrane domain-dependent self-association of CgMAVS may be crucial for its signalling and that CgMAVS can recruit the downstream signalling molecule, TRAF6, which can subsequently activate NF-κB signal pathway. Moreover, oyster IRFs appeared to function downstream of CgMAVS and were able to activate the interferon β promoter and interferon stimulated response elements in mammalian cells. These results establish invertebrate MAVS-dependent RLR signalling for the first time and would be helpful for deciphering the antiviral mechanisms of invertebrates and understanding the development of the vertebrate RLR network.
Collapse
Affiliation(s)
- Baoyu Huang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China.,National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Linlin Zhang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China.,National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Yishuai Du
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China.,National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Fei Xu
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China.,National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
| | - Li Li
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China. .,Laboratory for Marine Fisheries and Aquaculture, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China. .,National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.
| | - Guofan Zhang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China. .,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266071, China. .,National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China.
| |
Collapse
|
19
|
Bryan K, McGivney BA, Farries G, McGettigan PA, McGivney CL, Gough KF, MacHugh DE, Katz LM, Hill EW. Equine skeletal muscle adaptations to exercise and training: evidence of differential regulation of autophagosomal and mitochondrial components. BMC Genomics 2017; 18:595. [PMID: 28793853 PMCID: PMC5551008 DOI: 10.1186/s12864-017-4007-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 08/02/2017] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND A single bout of exercise induces changes in gene expression in skeletal muscle. Regular exercise results in an adaptive response involving changes in muscle architecture and biochemistry, and is an effective way to manage and prevent common human diseases such as obesity, cardiovascular disorders and type II diabetes. However, the biomolecular mechanisms underlying such responses still need to be fully elucidated. Here we performed a transcriptome-wide analysis of skeletal muscle tissue in a large cohort of untrained Thoroughbred horses (n = 51) before and after a bout of high-intensity exercise and again after an extended period of training. We hypothesized that regular high-intensity exercise training primes the transcriptome for the demands of high-intensity exercise. RESULTS An extensive set of genes was observed to be significantly differentially regulated in response to a single bout of high-intensity exercise in the untrained cohort (3241 genes) and following multiple bouts of high-intensity exercise training over a six-month period (3405 genes). Approximately one-third of these genes (1025) and several biological processes related to energy metabolism were common to both the exercise and training responses. We then developed a novel network-based computational analysis pipeline to test the hypothesis that these transcriptional changes also influence the contextual molecular interactome and its dynamics in response to exercise and training. The contextual network analysis identified several important hub genes, including the autophagosomal-related gene GABARAPL1, and dynamic functional modules, including those enriched for mitochondrial respiratory chain complexes I and V, that were differentially regulated and had their putative interactions 're-wired' in the exercise and/or training responses. CONCLUSION Here we have generated for the first time, a comprehensive set of genes that are differentially expressed in Thoroughbred skeletal muscle in response to both exercise and training. These data indicate that consecutive bouts of high-intensity exercise result in a priming of the skeletal muscle transcriptome for the demands of the next exercise bout. Furthermore, this may also lead to an extensive 're-wiring' of the molecular interactome in both exercise and training and include key genes and functional modules related to autophagy and the mitochondrion.
Collapse
Affiliation(s)
- Kenneth Bryan
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Beatrice A. McGivney
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Gabriella Farries
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Paul A. McGettigan
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Charlotte L. McGivney
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Katie F. Gough
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| | - David E. MacHugh
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
- UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Lisa M. Katz
- UCD School of Veterinary Medicine, University College Dublin, Belfield, D04 V1W8 Ireland
| | - Emmeline W. Hill
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, D04 V1W8 Ireland
| |
Collapse
|
20
|
Wang J, Wang Q, Lu D, Zhou F, Wang D, Feng R, Wang K, Molday R, Xie J, Wen T. A biosystems approach to identify the molecular signaling mechanisms of TMEM30A during tumor migration. PLoS One 2017. [PMID: 28640862 PMCID: PMC5481017 DOI: 10.1371/journal.pone.0179900] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Understanding the molecular mechanisms underlying cell migration, which plays an important role in tumor growth and progression, is critical for the development of novel tumor therapeutics. Overexpression of transmembrane protein 30A (TMEM30A) has been shown to initiate tumor cell migration, however, the molecular mechanisms through which this takes place have not yet been reported. Thus, we propose the integration of computational and experimental approaches by first predicting potential signaling networks regulated by TMEM30A using a) computational biology methods, b) our previous mass spectrometry results of the TMEM30A complex in mouse tissue, and c) a number of migration-related genes manually collected from the literature, and subsequently performing molecular biology experiments including the in vitro scratch assay and real-time quantitative polymerase chain reaction (qPCR) to validate the reliability of the predicted network. The results verify that the genes identified in the computational signaling network are indeed regulated by TMEM30A during cell migration, indicating the effectiveness of our proposed method and shedding light on the regulatory mechanisms underlying tumor migration, which facilitates the understanding of the molecular basis of tumor invasion.
Collapse
Affiliation(s)
- Jiao Wang
- Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, Shanghai, China
| | - Qian Wang
- Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, Shanghai, China
| | - Dongfang Lu
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Fangfang Zhou
- Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, Shanghai, China
| | - Dong Wang
- Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, Shanghai, China
| | - Ruili Feng
- Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, Shanghai, China
| | - Kai Wang
- Shanghai Key Laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Robert Molday
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada
| | - Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
- * E-mail: (JX); (TQW)
| | - Tieqiao Wen
- Laboratory of Molecular Neural Biology, School of Life Sciences, Shanghai University, Shanghai, China
- * E-mail: (JX); (TQW)
| |
Collapse
|
21
|
Ancherbak S, Kuruoglu EE, Vingron M. Time-Dependent Gene Network Modelling by Sequential Monte Carlo. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:1183-1193. [PMID: 26540693 DOI: 10.1109/tcbb.2015.2496301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Most existing methods used for gene regulatory network modeling are dedicated to inference of steady state networks, which are prevalent over all time instants. However, gene interactions evolve over time. Information about the gene interactions in different stages of the life cycle of a cell or an organism is of high importance for biology. In the statistical graphical models literature, one can find a number of methods for studying steady-state network structures while the study of time varying networks is rather recent. A sequential Monte Carlo method, namely particle filtering (PF), provides a powerful tool for dynamic time series analysis. In this work, the PF technique is proposed for dynamic network inference and its potentials in time varying gene expression data tracking are demonstrated. The data used for validation are synthetic time series data available from the DREAM4 challenge, generated from known network topologies and obtained from transcriptional regulatory networks of S. cerevisiae. We model the gene interactions over the course of time with multivariate linear regressions where the parameters of the regressive process are changing over time.
Collapse
|
22
|
Cai Y, Pan L, Miao J, Liu T. Identification of interacting proteins with aryl hydrocarbon receptor in scallop Chlamys farreri by yeast two hybrid screening. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2016; 133:381-389. [PMID: 27497785 DOI: 10.1016/j.ecoenv.2016.07.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 07/08/2016] [Accepted: 07/11/2016] [Indexed: 06/06/2023]
Abstract
The aryl hydrocarbon receptor (AhR) belongs to the basic-helix-loop helix (bHLH) Per-Arnt-Sim (PAS) family of transcription factors. AhR has been known primarily for its role in the regulation of several drug and xenobiotic metabolizing enzymes, as well as the mediation of the toxicity of certain xenobiotics, including 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Although the AhR is well-studied as a mediator of the toxicity of certain xenobiotics in marine bivalves, the normal physiological function remains unknown. In order to explore the function of the AhR, the bait protein expression plasmid pGBKT7-CfAhR and the cDNA library of gill from Chlamys farreri were constructed. By yeast two hybrid system, after multiple screening with the high screening rate medium, rotary verification, sequencing and bioinformatics analysis, the interactions of the CfAhR with receptor for activated protein kinase C 1 (RACK1), thyroid peroxidase-like protein (TPO), Toll-like receptor 4(TLR 4), androglobin-like, store-operated Ca(2+) entry (SocE), ADP/ATP carrier protein, cytochrome b, thioesterase, actin, ferritin subunit 1, poly-ubiquitin, short-chain collagen C4-like and one hypothetical protein in gill cells were identified. This study suggests that the CfAhR played fundamental roles in immune system homeostasis, oxidative stress response, and in grow and development of C. farreri. The elucidation of these protein interactions is of much importance both in understanding the normal physiological function of AhR, and as potential targets for further research on protein function in AhR interactions.
Collapse
Affiliation(s)
- Yuefeng Cai
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao 266003, PR China
| | - Luqing Pan
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao 266003, PR China.
| | - Jingjing Miao
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao 266003, PR China
| | - Tong Liu
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao 266003, PR China
| |
Collapse
|
23
|
Henriques R, Madeira SC. BicNET: Flexible module discovery in large-scale biological networks using biclustering. Algorithms Mol Biol 2016; 11:14. [PMID: 27213009 PMCID: PMC4875761 DOI: 10.1186/s13015-016-0074-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 04/22/2016] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Despite the recognized importance of module discovery in biological networks to enhance our understanding of complex biological systems, existing methods generally suffer from two major drawbacks. First, there is a focus on modules where biological entities are strongly connected, leading to the discovery of trivial/well-known modules and to the inaccurate exclusion of biological entities with subtler yet relevant roles. Second, there is a generalized intolerance towards different forms of noise, including uncertainty associated with less-studied biological entities (in the context of literature-driven networks) and experimental noise (in the context of data-driven networks). Although state-of-the-art biclustering algorithms are able to discover modules with varying coherency and robustness to noise, their application for the discovery of non-dense modules in biological networks has been poorly explored and it is further challenged by efficiency bottlenecks. METHODS This work proposes Biclustering NETworks (BicNET), a biclustering algorithm to discover non-trivial yet coherent modules in weighted biological networks with heightened efficiency. Three major contributions are provided. First, we motivate the relevance of discovering network modules given by constant, symmetric, plaid and order-preserving biclustering models. Second, we propose an algorithm to discover these modules and to robustly handle noisy and missing interactions. Finally, we provide new searches to tackle time and memory bottlenecks by effectively exploring the inherent structural sparsity of network data. RESULTS Results in synthetic network data confirm the soundness, efficiency and superiority of BicNET. The application of BicNET on protein interaction and gene interaction networks from yeast, E. coli and Human reveals new modules with heightened biological significance. CONCLUSIONS BicNET is, to our knowledge, the first method enabling the efficient unsupervised analysis of large-scale network data for the discovery of coherent modules with parameterizable homogeneity.
Collapse
Affiliation(s)
- Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Sara C. Madeira
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
24
|
Kume K, Ishida K, Ikeda M, Takemoto K, Shimura T, Young L, Nishizuka SS. Systematic Protein Level Regulation via Degradation Machinery Induced by Genotoxic Drugs. J Proteome Res 2016; 15:205-15. [PMID: 26625007 DOI: 10.1021/acs.jproteome.5b00759] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
In this study we monitored protein dynamics in response to cisplatin, 5-fluorouracil, and irinotecan with different concentrations and administration modes using "reverse-phase" protein arrays (RPPAs) in order to gain comprehensive insight into the protein dynamics induced by genotoxic drugs. Among 666 protein time-courses, 38% exhibited an increasing trend, 32% exhibited a steady decrease, and 30% fluctuated within 24 h after drug exposure. We analyzed almost 12,000 time-course pairs of protein levels based on the geometrical similarity by correlation distance (dCor). Twenty-two percent of the pairs showed dCor > 0.8, which indicates that each protein of the pair had similar dynamics. These trends were disrupted by a proteasome inhibitor, MG132, suggesting that the protein degradation system was activated in response to the drugs. Among the pairs with high dCor, the average dCor of pairs with apoptosis-related protein was significantly higher than those without, indicating that regulation of protein levels was induced by the drugs. These results suggest that the levels of numerous functionally distinct proteins may be regulated by common degradation machinery induced by genotoxic drugs.
Collapse
Affiliation(s)
- Kohei Kume
- Medical Innovation for Advanced Science and Technology program (MIAST), Iwate Medical University , Morioka, Iwate 020-8505, Japan.,Institute for Biomedical Sciences, Iwate Medical University , Yahaba, Iwate 020-8505, Japan
| | | | | | - Kazuhiro Takemoto
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology , Iizuka, Fukuoka 820-8502, Japan
| | - Tsutomu Shimura
- Department of Environmental Health, National Institute of Public Health , Wako-shi, Saitama 351-097, Japan
| | - Lynn Young
- National Institutes of Health (NIH) Library, Division of Library Services, Office of Research Services, National Institutes of Health , Bethesda, Maryland 20892, United States
| | - Satoshi S Nishizuka
- Medical Innovation for Advanced Science and Technology program (MIAST), Iwate Medical University , Morioka, Iwate 020-8505, Japan.,Institute for Biomedical Sciences, Iwate Medical University , Yahaba, Iwate 020-8505, Japan.,Department of Surgery, Iwate Medical University School of Dentistry , Morioka, Iwate 020-8505, Japan
| |
Collapse
|
25
|
Huang B, Zhang L, Du Y, Li L, Tang X, Zhang G. Molecular characterization and functional analysis of tumor necrosis factor receptor-associated factor 2 in the Pacific oyster. FISH & SHELLFISH IMMUNOLOGY 2016; 48:12-9. [PMID: 26621757 DOI: 10.1016/j.fsi.2015.11.027] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 11/02/2015] [Accepted: 11/22/2015] [Indexed: 05/11/2023]
Abstract
Tumor necrosis factor receptor (TNFR)-associated factors (TRAFs) are a family of crucial adaptors, playing vital roles in mediating signal transduction in immune signaling pathways, including RIG-I-like receptor (RLR) signaling pathway. In the present study, a new TRAF family member (CgTRAF2) was identified in the Pacific oyster, Crassostrea gigas. Comparison and phylogenetic analysis revealed that CgTRAF2 could be a new member of the invertebrate TRAF2 family. Quantitative real-time PCR revealed that CgTRAF2 mRNA was highly expressed in the digestive gland, gills, and hemocytes, and it was significantly up-regulated after Vibrio alginolyticus and ostreid herpesvirus 1 (OsHV-1) challenge. The CgTRAF2 mRNA expression profile in different developmental stages of oyster larvae suggested that CgTRAF2 could function in early larval development. CgTRAF2 mRNA expression pattern, after the silence of CgMAVS (Mitochondrial Antiviral Signaling) -like, indicated that CgTRAF2 might function downstream of CgMAVS-like. Moreover, the subcellular localization analysis revealed that CgTRAF2 was localized in cytoplasm, and it may play predominately important roles in signal transduction. Collectively, these results demonstrated that CgTRAF2 might play important roles in the innate immunity and larval development of the Pacific oyster.
Collapse
Affiliation(s)
- Baoyu Huang
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
| | - Linlin Zhang
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
| | - Yishuai Du
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
| | - Li Li
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China.
| | - Xueying Tang
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guofan Zhang
- National & Local Joint Engineering Laboratory of Ecological Mariculture, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China.
| |
Collapse
|
26
|
Abstract
Motivation: Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of datasets. Results: We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed datasets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study, we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of datasets offers substantial gains relative to inference of separate networks for each dataset. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies. Availability and implementation: Source code is at https://github.com/marinkaz/fusenet. Contact:blaz.zupan@fri.uni-lj.si Supplementary information:Supplementary information is available at Bioinformatics online.
Collapse
Affiliation(s)
- Marinka Žitnik
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Blaž Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
27
|
An F, Zhang Z, Xia M. Functional analysis of the nasopharyngeal carcinoma primary tumor‑associated gene interaction network. Mol Med Rep 2015; 12:4975-80. [PMID: 26238040 PMCID: PMC4581807 DOI: 10.3892/mmr.2015.4090] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 06/22/2015] [Indexed: 01/18/2023] Open
Abstract
The aim of the present study was to investigate the molecular mechanism of nasopharyngeal carcinoma (NPC) primary tumor development through the identification of key genes using bioinformatics approaches. Using the GSE53819 microarray dataset, acquired from the Gene Expression Omnibus database, differentially expressed genes (DEGs) were screened out between NPC primary tumor and control samples, followed by hierarchical clustering analysis. The Search Tool for the Retrieval of Interacting Genes database was utilized to build a protein-protein interaction network to identify key node proteins. In total, 1,067 DEGs, including 326 upregulated genes and 741 downregulated genes, were identified between the NPC and control samples. The results of the hierarchical clustering analysis demonstrated that 95% of the DEGs were sample-specific. Furthermore, PDZ binding kinase (PBK), centromere protein F (CENPF), actin-binding protein anillin (ANLN), exonuclease 1 (EXO1) and chromosome 15 open reading frame 42 (C15ORF42) were included in the obtained network module, which was closely associated with the cell cycle and nucleic acid metabolic process GO functions. The results of the present study revealed that EXO1, CENPF, ANLN, PBK and C15ORF42 may be involved in the mechanism of NPC via modulating the cell cycle and nucleic acid metabolic processes, and may serve as molecular biomarkers for the diagnosis of this disease.
Collapse
Affiliation(s)
- Fengwei An
- Department of Otorhinolaryngology, Jinan Military General Hospital, Jinan, Shandong 250031, P.R. China
| | - Zhiqiang Zhang
- Department of Gastroenterology and Hepatology, People's Hospital of Huangdao, Qingdao, Shandong 266400, P.R. China
| | - Ming Xia
- Department of Otorhinolaryngology, The Second Hospital of Shandong University, Jinan, Shandong 250031, P.R. China
| |
Collapse
|
28
|
Lin GN, Corominas R, Lemmens I, Yang X, Tavernier J, Hill DE, Vidal M, Sebat J, Iakoucheva LM. Spatiotemporal 16p11.2 protein network implicates cortical late mid-fetal brain development and KCTD13-Cul3-RhoA pathway in psychiatric diseases. Neuron 2015; 85:742-54. [PMID: 25695269 DOI: 10.1016/j.neuron.2015.01.010] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Revised: 08/17/2014] [Accepted: 01/14/2015] [Indexed: 12/19/2022]
Abstract
The psychiatric disorders autism and schizophrenia have a strong genetic component, and copy number variants (CNVs) are firmly implicated. Recurrent deletions and duplications of chromosome 16p11.2 confer a high risk for both diseases, but the pathways disrupted by this CNV are poorly defined. Here we investigate the dynamics of the 16p11.2 network by integrating physical interactions of 16p11.2 proteins with spatiotemporal gene expression from the developing human brain. We observe profound changes in protein interaction networks throughout different stages of brain development and/or in different brain regions. We identify the late mid-fetal period of cortical development as most critical for establishing the connectivity of 16p11.2 proteins with their co-expressed partners. Furthermore, our results suggest that the regulation of the KCTD13-Cul3-RhoA pathway in layer 4 of the inner cortical plate is crucial for controlling brain size and connectivity and that its dysregulation by de novo mutations may be a potential determinant of 16p11.2 CNV deletion and duplication phenotypes.
Collapse
Affiliation(s)
- Guan Ning Lin
- Department of Psychiatry, University of California San Diego, La Jolla, CA 92093, USA
| | - Roser Corominas
- Department of Psychiatry, University of California San Diego, La Jolla, CA 92093, USA
| | - Irma Lemmens
- Department of Medical Protein Research, VIB, and Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Xinping Yang
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, and Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - David E Hill
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, and Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, La Jolla, CA 92093, USA; Beyster Center for Genomics of Psychiatric Diseases, University of California San Diego, La Jolla, CA 92093, USA
| | - Lilia M Iakoucheva
- Department of Psychiatry, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
29
|
Mitra R, Müller P, Qiu P, Ji Y. Bayesian hierarchical models for protein networks in single-cell mass cytometry. Cancer Inform 2014; 13:79-89. [PMID: 25574129 PMCID: PMC4266200 DOI: 10.4137/cin.s13984] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2014] [Revised: 07/31/2014] [Accepted: 08/01/2014] [Indexed: 11/05/2022] Open
Abstract
We propose a class of hierarchical models to investigate the protein functional network of cellular markers. We consider a novel data set from single-cell proteomics. The data are generated from single-cell mass cytometry experiments, in which protein expression is measured within an individual cell for multiple markers. Tens of thousands of cells are measured serving as biological replicates. Applying the Bayesian models, we report protein functional networks under different experimental conditions and the differences between the networks, ie, differential networks. We also present the differential network in a novel fashion that allows direct observation of the links between the experimental agent and its putative targeted proteins based on posterior inference. Our method serves as a powerful tool for studying molecular interactions at cellular level.
Collapse
Affiliation(s)
- Riten Mitra
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, USA
| | - Peter Müller
- Department of Mathematics, The University of Texas at Austin, Austin, TX, USA
| | - Peng Qiu
- Department of Biomedical Engineering, Emory University and Georgia Tech University, Atlanta, GA, USA
| | - Yuan Ji
- Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, IL, USA
- Department of Health Studies, The University of Chicago, Chicago, IL, USA
| |
Collapse
|
30
|
Lotan A, Fenckova M, Bralten J, Alttoa A, Dixson L, Williams RW, van der Voet M. Neuroinformatic analyses of common and distinct genetic components associated with major neuropsychiatric disorders. Front Neurosci 2014; 8:331. [PMID: 25414627 PMCID: PMC4222236 DOI: 10.3389/fnins.2014.00331] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Accepted: 10/01/2014] [Indexed: 12/11/2022] Open
Abstract
Major neuropsychiatric disorders are highly heritable, with mounting evidence suggesting that these disorders share overlapping sets of molecular and cellular underpinnings. In the current article we systematically test the degree of genetic commonality across six major neuropsychiatric disorders-attention deficit hyperactivity disorder (ADHD), anxiety disorders (Anx), autistic spectrum disorders (ASD), bipolar disorder (BD), major depressive disorder (MDD), and schizophrenia (SCZ). We curated a well-vetted list of genes based on large-scale human genetic studies based on the NHGRI catalog of published genome-wide association studies (GWAS). A total of 180 genes were accepted into the analysis on the basis of low but liberal GWAS p-values (<10(-5)). 22% of genes overlapped two or more disorders. The most widely shared subset of genes-common to five of six disorders-included ANK3, AS3MT, CACNA1C, CACNB2, CNNM2, CSMD1, DPCR1, ITIH3, NT5C2, PPP1R11, SYNE1, TCF4, TENM4, TRIM26, and ZNRD1. Using a suite of neuroinformatic resources, we showed that many of the shared genes are implicated in the postsynaptic density (PSD), expressed in immune tissues and co-expressed in developing human brain. Using a translational cross-species approach, we detected two distinct genetic components that were both shared by each of the six disorders; the 1st component is involved in CNS development, neural projections and synaptic transmission, while the 2nd is implicated in various cytoplasmic organelles and cellular processes. Combined, these genetic components account for 20-30% of the genetic load. The remaining risk is conferred by distinct, disorder-specific variants. Our systematic comparative analysis of shared and unique genetic factors highlights key gene sets and molecular processes that may ultimately translate into improved diagnosis and treatment of these debilitating disorders.
Collapse
Affiliation(s)
- Amit Lotan
- Department of Adult Psychiatry and the Biological Psychiatry Laboratory, Hadassah-Hebrew University Medical Center Jerusalem, Israel
| | - Michaela Fenckova
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Netherlands
| | - Janita Bralten
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Netherlands ; Department of Cognitive Neuroscience, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Netherlands
| | - Aet Alttoa
- Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric Neurobiology Program, University of Würzburg Würzburg, Germany
| | - Luanna Dixson
- Department of Psychiatry and Psychotherapy, Medical Faculty Mannheim, Central Institute of Mental Health, University of Heidelberg Mannheim, Germany
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, Center for Integrative and Translational Genomics, University of Tennessee Health Science Center Memphis, TN, USA
| | - Monique van der Voet
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Netherlands
| |
Collapse
|
31
|
Ding YD, Chang JW, Guo J, Chen D, Li S, Xu Q, Deng XX, Cheng YJ, Chen LL. Prediction and functional analysis of the sweet orange protein-protein interaction network. BMC PLANT BIOLOGY 2014; 14:213. [PMID: 25091279 PMCID: PMC4236729 DOI: 10.1186/s12870-014-0213-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 07/24/2014] [Indexed: 05/04/2023]
Abstract
BACKGROUND Sweet orange (Citrus sinensis) is one of the most important fruits world-wide. Because it is a woody plant with a long growth cycle, genetic studies of sweet orange are lagging behind those of other species. RESULTS In this analysis, we employed ortholog identification and domain combination methods to predict the protein-protein interaction (PPI) network for sweet orange. The K-nearest neighbors (KNN) classification method was used to verify and filter the network. The final predicted PPI network, CitrusNet, contained 8,195 proteins with 124,491 interactions. The quality of CitrusNet was evaluated using gene ontology (GO) and Mapman annotations, which confirmed the reliability of the network. In addition, we calculated the expression difference of interacting genes (EDI) in CitrusNet using RNA-seq data from four sweet orange tissues, and also analyzed the EDI distribution and variation in different sub-networks. CONCLUSIONS Gene expression in CitrusNet has significant modular features. Target of rapamycin (TOR) protein served as the central node of the hormone-signaling sub-network. All evidence supported the idea that TOR can integrate various hormone signals and affect plant growth. CitrusNet provides valuable resources for the study of biological functions in sweet orange.
Collapse
Affiliation(s)
- Yu-Duan Ding
- Key Laboratory of Horticultural Plant Biology of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Ji-Wei Chang
- Agricultural Bioinformatics Key laboratory of Hubei Province, College of Information, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Jing Guo
- Agricultural Bioinformatics Key laboratory of Hubei Province, College of Information, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - DiJun Chen
- Agricultural Bioinformatics Key laboratory of Hubei Province, College of Information, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Sen Li
- Agricultural Bioinformatics Key laboratory of Hubei Province, College of Information, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Qiang Xu
- Key Laboratory of Horticultural Plant Biology of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Xiu-Xin Deng
- Key Laboratory of Horticultural Plant Biology of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Yun-Jiang Cheng
- Key Laboratory of Horticultural Plant Biology of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Ling-Ling Chen
- Agricultural Bioinformatics Key laboratory of Hubei Province, College of Information, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| |
Collapse
|
32
|
Alternative splicing and immune response of Crassostrea gigas tumor necrosis factor receptor-associated factor 3. Mol Biol Rep 2014; 41:6481-91. [DOI: 10.1007/s11033-014-3531-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2013] [Accepted: 06/19/2014] [Indexed: 11/26/2022]
|
33
|
Wang X, Qian H, Zhang S. Discovery of significant pathways in breast cancer metastasis via module extraction and comparison. IET Syst Biol 2014; 8:47-55. [PMID: 25014225 PMCID: PMC8687293 DOI: 10.1049/iet-syb.2013.0041] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Revised: 12/03/2013] [Accepted: 12/30/2013] [Indexed: 09/29/2023] Open
Abstract
Discovering significant pathways rather than single genes or small gene sets involved in metastasis is becoming more and more important in the study of breast cancer. Many researches have shed light on this problem. However, most of the existing works are relying on some priori biological information, which may bring bias to the models. The authors propose a new method that detects metastasis-related pathways by identifying and comparing modules in metastasis and non-metastasis gene co-expression networks. The gene co-expression networks are built by Pearson correlation coefficients, and then the modules inferred in these two networks are compared. In metastasis and non-metastasis networks, 36 and 41 significant modules are identified. Also, 27.8% (metastasis) and 29.3% (non-metastasis) of the modules are enriched significantly for one or several pathways with p-value <0.05. Many breast cancer genes including RB1, CCND1 and TP53 are included in these identified pathways. Five significant pathways are discovered only in metastasis network: glycolysis pathway, cell adhesion molecules, focal adhesion, stathmin and breast cancer resistance to antimicrotubule agents, and cytosolic DNA-sensing pathway. The first three pathways have been proved to be closely associated with metastasis. The rest two can be taken as a guide for future research in breast cancer metastasis.
Collapse
Affiliation(s)
- Xiaochen Wang
- School of Mathematical Sciences, Fudan University, Shanghai 200433, People's Republic of China
| | - Huajie Qian
- School of Mathematical Sciences, Fudan University, Shanghai 200433, People's Republic of China
| | - Shuqin Zhang
- Center for Computational Systems Biology, School of Mathematical Sciences, Fudan University Shanghai, Shanghai 200433, People's Republic of China.
| |
Collapse
|
34
|
Poon A, Goldowitz D. Identification of genetic loci that modulate cell proliferation in the adult rostral migratory stream using the expanded panel of BXD mice. BMC Genomics 2014; 15:206. [PMID: 24640950 PMCID: PMC4004255 DOI: 10.1186/1471-2164-15-206] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2013] [Accepted: 03/10/2014] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Adult neurogenesis, which is the continual production of new neurons in the mature brain, demonstrates the strikingly plastic nature of the nervous system. Adult neural stem cells and their neural precursors, collectively referred to as neural progenitor cells (NPCs), are present in the subgranular zone (SGZ) of the dentate gyrus, the subventricular zone (SVZ), and rostral migratory stream (RMS). In order to harness the potential of NPCs to treat neurodegenerative diseases and brain injuries, it will be important to understand the molecules that regulate NPCs in the adult brain. The genetic basis underlying NPC proliferation is still not fully understood. From our previous quantitative trait locus (QTL) analysis, we had success in using a relatively small reference population of recombinant inbred strains of mice (AXBXA) to identify a genetic region that is significantly correlated with NPC proliferation in the RMS. RESULTS In this study, we expanded our initial QTL mapping of RMS proliferation to a far richer genetic resource, the BXD RI mouse strains. A 3-fold difference in the number of proliferative, bromodeoxyuridine (BrdU)-labeled cells was quantified in the adult RMS of 61 BXD RI strains. RMS cell proliferation is highly dependent on the genetic background of the mice with an estimated heritability of 0.58. Genome-wide mapping revealed a significant QTL on chromosome (Chr) 6 and a suggestive QTL on Chr 11 regulating the number of NPCs in the RMS. Composite interval analysis further revealed secondary QTLs on Chr 14 and Chr 18. The loci regulating RMS cell proliferation did not overlap with the suggestive loci modulating cell proliferation in the SGZ. These mapped loci serve as starting points to identify genes important for this process. A subset of candidate genes in this region is associated with cell proliferation and neurogenesis. Interconnectivity of these candidate genes was demonstrated using pathway and transcriptional covariance analyses. CONCLUSIONS Differences in RMS cell proliferation across the BXD RI strains identifies genetic loci that serve to provide insights into the interplay of underlying genes that may be important for regulating NPC proliferation in the adult mouse brain.
Collapse
Affiliation(s)
| | - Daniel Goldowitz
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada.
| |
Collapse
|
35
|
Integrative approaches for finding modular structure in biological networks. Nat Rev Genet 2013; 14:719-32. [PMID: 24045689 DOI: 10.1038/nrg3552] [Citation(s) in RCA: 343] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
A central goal of systems biology is to elucidate the structural and functional architecture of the cell. To this end, large and complex networks of molecular interactions are being rapidly generated for humans and model organisms. A recent focus of bioinformatics research has been to integrate these networks with each other and with diverse molecular profiles to identify sets of molecules and interactions that participate in a common biological function - that is, 'modules'. Here, we classify such integrative approaches into four broad categories, describe their bioinformatic principles and review their applications.
Collapse
|
36
|
Sławek J, Arodź T. ENNET: inferring large gene regulatory networks from expression data using gradient boosting. BMC SYSTEMS BIOLOGY 2013; 7:106. [PMID: 24148309 PMCID: PMC4015806 DOI: 10.1186/1752-0509-7-106] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Accepted: 10/17/2013] [Indexed: 01/19/2023]
Abstract
BACKGROUND The regulation of gene expression by transcription factors is a key determinant of cellular phenotypes. Deciphering genome-wide networks that capture which transcription factors regulate which genes is one of the major efforts towards understanding and accurate modeling of living systems. However, reverse-engineering the network from gene expression profiles remains a challenge, because the data are noisy, high dimensional and sparse, and the regulation is often obscured by indirect connections. RESULTS We introduce a gene regulatory network inference algorithm ENNET, which reverse-engineers networks of transcriptional regulation from a variety of expression profiles with a superior accuracy compared to the state-of-the-art methods. The proposed method relies on the boosting of regression stumps combined with a relative variable importance measure for the initial scoring of transcription factors with respect to each gene. Then, we propose a technique for using a distribution of the initial scores and information about knockouts to refine the predictions. We evaluated the proposed method on the DREAM3, DREAM4 and DREAM5 data sets and achieved higher accuracy than the winners of those competitions and other established methods. CONCLUSIONS Superior accuracy achieved on the three different benchmark data sets shows that ENNET is a top contender in the task of network inference. It is a versatile method that uses information about which gene was knocked-out in which experiment if it is available, but remains the top performer even without such information. ENNET is available for download from https://github.com/slawekj/ennet under the GNU GPLv3 license.
Collapse
Affiliation(s)
- Janusz Sławek
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| | - Tomasz Arodź
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| |
Collapse
|
37
|
Zhang J, Zhang S, Wang Y, Zhang XS. Identification of mutated core cancer modules by integrating somatic mutation, copy number variation, and gene expression data. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 2:S4. [PMID: 24565034 PMCID: PMC3851989 DOI: 10.1186/1752-0509-7-s2-s4] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Understanding the molecular mechanisms underlying cancer is an important step for the effective diagnosis and treatment of cancer patients. With the huge volume of data from the large-scale cancer genomics projects, an open challenge is to distinguish driver mutations, pathways, and gene sets (or core modules) that contribute to cancer formation and progression from random passengers which accumulate in somatic cells but do not contribute to tumorigenesis. Due to mutational heterogeneity, current analyses are often restricted to known pathways and functional modules for enrichment of somatic mutations. Therefore, discovery of new pathways and functional modules is a pressing need. RESULTS In this study, we propose a novel method to identify Mutated Core Modules in Cancer (iMCMC) without any prior information other than cancer genomic data from patients with tumors. This is a network-based approach in which three kinds of data are integrated: somatic mutations, copy number variations (CNVs), and gene expressions. Firstly, the first two datasets are merged to obtain a mutation matrix, based on which a weighted mutation network is constructed where the vertex weight corresponds to gene coverage and the edge weight corresponds to the mutual exclusivity between gene pairs. Similarly, a weighted expression network is generated from the expression matrix where the vertex and edge weights correspond to the influence of a gene mutation on other genes and the Pearson correlation of gene mutation-correlated expressions, respectively. Then an integrative network is obtained by further combining these two networks, and the most coherent subnetworks are identified by using an optimization model. Finally, we obtained the core modules for tumors by filtering with significance and exclusivity tests. We applied iMCMC to the Cancer Genome Atlas (TCGA) glioblastoma multiforme (GBM) and ovarian carcinoma data, and identified several mutated core modules, some of which are involved in known pathways. Most of the implicated genes are oncogenes or tumor suppressors previously reported to be related to carcinogenesis. As a comparison, we also performed iMCMC on two of the three kinds of data, i.e., the datasets combining somatic mutations with CNVs and secondly the datasets combining somatic mutations with gene expressions. The results indicate that gene expressions or CNVs indeed provide extra useful information to the original data for the identification of core modules in cancer. CONCLUSIONS This study demonstrates the utility of our iMCMC by integrating multiple data sources to identify mutated core modules in cancer. In addition to presenting a generally applicable methodology, our findings provide several candidate pathways or core modules recurrently perturbed in GBM or ovarian carcinoma for further studies.
Collapse
|
38
|
Parfett C, Williams A, Zheng J, Zhou G. Gene batteries and synexpression groups applied in a multivariate statistical approach to dose–response analysis of toxicogenomic data. Regul Toxicol Pharmacol 2013; 67:63-74. [DOI: 10.1016/j.yrtph.2013.06.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 06/26/2013] [Indexed: 12/28/2022]
|
39
|
Tang X, Feng Q, Wang J, He Y, Pan Y. Clustering based on multiple biological information: approach for predicting protein complexes. IET Syst Biol 2013; 7:223-30. [PMID: 24067423 PMCID: PMC8687320 DOI: 10.1049/iet-syb.2012.0052] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Revised: 03/23/2013] [Accepted: 04/15/2013] [Indexed: 04/05/2024] Open
Abstract
Protein complexes are a cornerstone of many biological processes. Protein-protein interaction (PPI) data enable a number of computational methods for predicting protein complexes. However, the insufficiency of the PPI data significantly lowers the accuracy of computational methods. In the current work, the authors develop a novel method named clustering based on multiple biological information (CMBI) to discover protein complexes via the integration of multiple biological resources including gene expression profiles, essential protein information and PPI data. First, CMBI defines the functional similarity of each pair of interacting proteins based on the edge-clustering coefficient and the Pearson correlation coefficient. Second, CMBI selects essential proteins as seeds to build the protein complexes. A redundancy-filtering procedure is performed to eliminate redundant complexes. In addition to the essential proteins, CMBI also uses other proteins as seeds to expand protein complexes. To check the performance of CMBI, the authors compare the complexes discovered by CMBI with the ones found by other techniques by matching the predicted complexes against the reference complexes. The authors use subsequently GO::TermFinder to analyse the complexes predicted by various methods. Finally, the effect of parameters T and R is investigated. The results from GO functional enrichment and matching analyses show that CMBI performs significantly better than the state-of-the-art methods.
Collapse
Affiliation(s)
- Xiwei Tang
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
- School of Information Science and Engineering, Hunan First Normal UniversityChangsha410205People's Republic of China
| | - Qilong Feng
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
| | - Yiming He
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
| | - Yi Pan
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
- Department of Computer ScienceGeorgia State UniversityAtlantaGA30302-4110USA
| |
Collapse
|
40
|
Identification of interconnected markers for T-cell acute lymphoblastic leukemia. BIOMED RESEARCH INTERNATIONAL 2013; 2013:210253. [PMID: 23956970 PMCID: PMC3727179 DOI: 10.1155/2013/210253] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 06/04/2013] [Indexed: 12/11/2022]
Abstract
T-cell acute lymphoblastic leukemia (T-ALL) is a complex disease, resulting from proliferation of differentially arrested immature T cells. The molecular mechanisms and the genes involved in the generation of T-ALL remain largely undefined. In this study, we propose a set of genes to differentiate individuals with T-ALL from the nonleukemia/healthy ones and genes that are not differential themselves but interconnected with highly differentially expressed ones. We provide new suggestions for pathways involved in the cause of T-ALL and show that network-based classification techniques produce fewer genes with more meaningful and successful results than expression-based approaches. We have identified 19 significant subnetworks, containing 102 genes. The classification/prediction accuracies of subnetworks are considerably high, as high as 98%. Subnetworks contain 6 nondifferentially expressed genes, which could potentially participate in pathogenesis of T-ALL. Although these genes are not differential, they may serve as biomarkers if their loss/gain of function contributes to generation of T-ALL via SNPs. We conclude that transcription factors, zinc-ion-binding proteins, and tyrosine kinases are the important protein families to trigger T-ALL. These potential disease-causing genes in our subnetworks may serve as biomarkers, alternative to the traditional ones used for the diagnosis of T-ALL, and help understand the pathogenesis of the disease.
Collapse
|
41
|
Predicting protein-protein interactions in the post synaptic density. Mol Cell Neurosci 2013; 56:128-39. [PMID: 23628905 DOI: 10.1016/j.mcn.2013.04.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 04/09/2013] [Accepted: 04/19/2013] [Indexed: 12/27/2022] Open
Abstract
The post synaptic density (PSD) is a specialization of the cytoskeleton at the synaptic junction, composed of hundreds of different proteins. Characterizing the protein components of the PSD and their interactions can help elucidate the mechanism of long-term changes in synaptic plasticity, which underlie learning and memory. Unfortunately, our knowledge of the proteome and interactome of the PSD is still partial and noisy. In this study we describe a computational framework to improve the reconstruction of the PSD network. The approach is based on learning the characteristics of PSD protein interactions from a set of trusted interactions, expanding this set with data collected from large scale repositories, and then predicting novel interaction with proteins that are suspected to reside in the PSD. Using this method we obtained thirty predicted interactions, with more than half of which having supporting evidence in the literature. We discuss in details two of these new interactions, Lrrtm1 with PSD-95 and Src with Capg. The first may take part in a mechanism underlying glutamatergic dysfunction in schizophrenia. The second suggests an alternative mechanism to regulate dendritic spines maturation.
Collapse
|
42
|
Haider S, Pal R. Integrated analysis of transcriptomic and proteomic data. Curr Genomics 2013; 14:91-110. [PMID: 24082820 PMCID: PMC3637682 DOI: 10.2174/1389202911314020003] [Citation(s) in RCA: 258] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Revised: 01/09/2013] [Accepted: 01/22/2013] [Indexed: 12/14/2022] Open
Abstract
Until recently, understanding the regulatory behavior of cells has been pursued through independent analysis of the transcriptome or the proteome. Based on the central dogma, it was generally assumed that there exist a direct correspondence between mRNA transcripts and generated protein expressions. However, recent studies have shown that the correlation between mRNA and Protein expressions can be low due to various factors such as different half lives and post transcription machinery. Thus, a joint analysis of the transcriptomic and proteomic data can provide useful insights that may not be deciphered from individual analysis of mRNA or protein expressions. This article reviews the existing major approaches for joint analysis of transcriptomic and proteomic data. We categorize the different approaches into eight main categories based on the initial algorithm and final analysis goal. We further present analogies with other domains and discuss the existing research problems in this area.
Collapse
Affiliation(s)
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| |
Collapse
|
43
|
Abstract
Modern experimental strategies often generate genome-scale measurements of human tissues or cell lines in various physiological states. Investigators often use these datasets individually to help elucidate molecular mechanisms of human diseases. Here we discuss approaches that effectively weight and integrate hundreds of heterogeneous datasets to gene-gene networks that focus on a specific process or disease. Diverse and systematic genome-scale measurements provide such approaches both a great deal of power and a number of challenges. We discuss some such challenges as well as methods to address them. We also raise important considerations for the assessment and evaluation of such approaches. When carefully applied, these integrative data-driven methods can make novel high-quality predictions that can transform our understanding of the molecular-basis of human disease.
Collapse
Affiliation(s)
- Casey S. Greene
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Olga G. Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
44
|
Aluru M, Zola J, Nettleton D, Aluru S. Reverse engineering and analysis of large genome-scale gene networks. Nucleic Acids Res 2012; 41:e24. [PMID: 23042249 PMCID: PMC3592423 DOI: 10.1093/nar/gks904] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web.
Collapse
Affiliation(s)
- Maneesha Aluru
- Department of Genetics, Iowa State University, Ames, IA 50011, USA.
| | | | | | | |
Collapse
|
45
|
Hua M, Pei J. Clustering in applications with multiple data sources—A mutual subspace clustering approach. Neurocomputing 2012. [DOI: 10.1016/j.neucom.2011.08.032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
46
|
Protein complex prediction based on maximum matching with domain-domain interaction. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2012; 1824:1418-24. [PMID: 22771297 DOI: 10.1016/j.bbapap.2012.06.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Revised: 06/13/2012] [Accepted: 06/18/2012] [Indexed: 11/22/2022]
Abstract
With the development of high-throughput methods for identifying protein-protein interactions, large scale interaction networks are available. Computational methods to analyze the networks to detect functional modules as protein complexes are becoming more important. However, most of the existing methods only make use of the protein-protein interaction networks without considering the structural limitations of proteins to bind together. In this paper, we design a new protein complex prediction method by extending the idea of using domain-domain interaction information. Here we formulate the problem into a maximum matching problem (which can be solved in polynomial time) instead of the binary integer linear programming approach (which can be NP-hard in the worst case). We also add a step to predict domain-domain interactions which first searches the database Pfam using the hidden Markov model and then predicts the domain-domain interactions based on the database DOMINE and InterDom which contain confirmed DDIs. By adding the domain-domain interaction prediction step, we have more edges in the DDI graph and the recall value is increased significantly (at least doubled) comparing with the method of Ozawa et al. (2010) [1] while the average precision value is slightly better. We also combine our method with three other existing methods, such as COACH, MCL and MCODE. Experiments show that the precision of the combined method is improved. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.
Collapse
|
47
|
Nascimento M, Sáfadi T, Fonseca e Silva F, Nascimento ACC. Bayesian model-based clustering of temporal gene expression using autoregressive panel data approach. Bioinformatics 2012; 28:2004-7. [PMID: 22668790 DOI: 10.1093/bioinformatics/bts322] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In a microarray time series analysis, due to the large number of genes evaluated, the first step toward understanding the complex time network is the clustering of genes that share similar expression patterns over time. Up until now, the proposed methods do not point simultaneously to the temporal autocorrelation of the gene expression and the model-based clustering. We present a Bayesian method that considers jointly the fit of autoregressive panel data models and hierarchical gene clustering. RESULTS The proposed methodology was able to cluster genes that share similar expression over time, which was determined jointly by the estimates of autoregression parameters, by the average level of expression) and by the quality of the fitted model. AVAILABILITY AND IMPLEMENTATION The R codes for implementation of the proposed clustering method and for simulation study, as well as the real and simulated datasets, are freely accessible on the Web http://www.det.ufv.br/~moyses/links.php. CONTACT moysesnascim@ufv.br.
Collapse
Affiliation(s)
- Moysés Nascimento
- Departamento de Estatística, Universidade Federal de Viçosa, Viçosa, Minas Gerais 36570-000, Brasil.
| | | | | | | |
Collapse
|
48
|
Yeh CY, Yeh HY, Arias CR, Soo VW. Pathway detection from protein interaction networks and gene expression data using color-coding methods and A∗ search algorithms. ScientificWorldJournal 2012; 2012:315797. [PMID: 22577352 PMCID: PMC3346698 DOI: 10.1100/2012/315797] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 10/17/2011] [Indexed: 12/02/2022] Open
Abstract
With the large availability of protein interaction networks and microarray data supported, to identify the linear paths that have biological significance in search of a potential pathway is a challenge issue. We proposed a color-coding method based on the characteristics of biological network topology and applied heuristic search to speed up color-coding method. In the experiments, we tested our methods by applying to two datasets: yeast and human prostate cancer networks and gene expression data set. The comparisons of our method with other existing methods on known yeast MAPK pathways in terms of precision and recall show that we can find maximum number of the proteins and perform comparably well. On the other hand, our method is more efficient than previous ones and detects the paths of length 10 within 40 seconds using CPU Intel 1.73 GHz and 1 GB main memory running under windows operating system.
Collapse
Affiliation(s)
- Cheng-Yu Yeh
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 300, Taiwan
| | - Hsiang-Yuan Yeh
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 300, Taiwan
| | - Carlos Roberto Arias
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 300, Taiwan
| | - Von-Wun Soo
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan
- Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu 300, Taiwan
| |
Collapse
|
49
|
|
50
|
Bertini JR, Zhao L, Motta R, Lopes ADA. A nonparametric classification method based on K-associated graphs. Inf Sci (N Y) 2011. [DOI: 10.1016/j.ins.2011.07.043] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|