1
|
Aromolaran OT, Isewon I, Adedeji E, Oswald M, Adebiyi E, Koenig R, Oyelade J. Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster. PLoS One 2023; 18:e0288023. [PMID: 37556452 PMCID: PMC10411809 DOI: 10.1371/journal.pone.0288023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/18/2023] [Indexed: 08/11/2023] Open
Abstract
Computational prediction of absolute essential genes using machine learning has gained wide attention in recent years. However, essential genes are mostly conditional and not absolute. Experimental techniques provide a reliable approach of identifying conditionally essential genes; however, experimental methods are laborious, time and resource consuming, hence computational techniques have been used to complement the experimental methods. Computational techniques such as supervised machine learning, or flux balance analysis are grossly limited due to the unavailability of required data for training the model or simulating the conditions for gene essentiality. This study developed a heuristic-enabled active machine learning method based on a light gradient boosting model to predict essential immune response and embryonic developmental genes in Drosophila melanogaster. We proposed a new sampling selection technique and introduced a heuristic function which replaces the human component in traditional active learning models. The heuristic function dynamically selects the unlabelled samples to improve the performance of the classifier in the next iteration. Testing the proposed model with four benchmark datasets, the proposed model showed superior performance when compared to traditional active learning models (random sampling and uncertainty sampling). Applying the model to identify conditionally essential genes, four novel essential immune response genes and a list of 48 novel genes that are essential in embryonic developmental condition were identified. We performed functional enrichment analysis of the predicted genes to elucidate their biological processes and the result evidence our predictions. Immune response and embryonic development related processes were significantly enriched in the essential immune response and embryonic developmental genes, respectively. Finally, we propose the predicted essential genes for future experimental studies and use of the developed tool accessible at http://heal.covenantuniversity.edu.ng for conditional essentiality predictions.
Collapse
Affiliation(s)
- Olufemi Tony Aromolaran
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Itunu Isewon
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Eunice Adedeji
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria
| | - Marcus Oswald
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum, Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum, Jena, Germany
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Rainer Koenig
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum, Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum, Jena, Germany
| | - Jelili Oyelade
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| |
Collapse
|
2
|
Mancuso CA, Bills PS, Krum D, Newsted J, Liu R, Krishnan A. GenePlexus: a web-server for gene discovery using network-based machine learning. Nucleic Acids Res 2022; 50:W358-W366. [PMID: 35580053 PMCID: PMC9252732 DOI: 10.1093/nar/gkac335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/13/2022] [Accepted: 04/30/2022] [Indexed: 11/28/2022] Open
Abstract
Biomedical researchers take advantage of high-throughput, high-coverage technologies to routinely generate sets of genes of interest across a wide range of biological conditions. Although these technologies have directly shed light on the molecular underpinnings of various biological processes and diseases, the list of genes from any individual experiment is often noisy and incomplete. Additionally, interpreting these lists of genes can be challenging in terms of how they are related to each other and to other genes in the genome. In this work, we present GenePlexus (https://www.geneplexus.net/), a web-server that allows a researcher to utilize a powerful, network-based machine learning method to gain insights into their gene set of interest and additional functionally similar genes. Once a user uploads their own set of human genes and chooses between a number of different human network representations, GenePlexus provides predictions of how associated every gene in the network is to the input set. The web-server also provides interpretability through network visualization and comparison to other machine learning models trained on thousands of known process/pathway and disease gene sets. GenePlexus is free and open to all users without the need for registration.
Collapse
Affiliation(s)
- Christopher A Mancuso
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Patrick S Bills
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Douglas Krum
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Jacob Newsted
- Data Management and Analytics, IT Services, Michigan State University, East Lansing, MI 48824, USA
| | - Renming Liu
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department Of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
3
|
Law JN, Akers K, Tasnina N, Santina CMD, Deutsch S, Kshirsagar M, Klein-Seetharaman J, Crovella M, Rajagopalan P, Kasif S, Murali TM. Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2. Gigascience 2021; 10:giab082. [PMID: 34966926 PMCID: PMC8716363 DOI: 10.1093/gigascience/giab082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 09/21/2021] [Accepted: 11/28/2021] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
Collapse
Affiliation(s)
- Jeffrey N Law
- Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA
| | - Kyle Akers
- Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA
| | - Nure Tasnina
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | | - Shay Deutsch
- Department of Mathematics, University of California, Los Angeles, CA 90095, USA
| | | | | | - Mark Crovella
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| | | | - Simon Kasif
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
4
|
Aromolaran O, Beder T, Adedeji E, Ajamma Y, Oyelade J, Adebiyi E, Koenig R. Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning. Comput Struct Biotechnol J 2021; 19:4581-4592. [PMID: 34471501 PMCID: PMC8385402 DOI: 10.1016/j.csbj.2021.08.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 11/25/2022] Open
Abstract
Pathogens causing infections, and particularly when invading the host cells, require the host cell machinery for efficient regeneration and proliferation during infection. For their life cycle, host proteins are needed and these Host Dependency Factors (HDF) may serve as therapeutic targets. Several attempts have approached screening for HDF producing large lists of potential HDF with, however, only marginal overlap. To get consistency into the data of these experimental studies, we developed a machine learning pipeline. As a case study, we used publicly available lists of experimentally derived HDF from twelve different screening studies based on gene perturbation in Drosophila melanogaster cells or in vivo upon bacterial or protozoan infection. A total of 50,334 gene features were generated from diverse categories including their functional annotations, topology attributes in protein interaction networks, nucleotide and protein sequence features, homology properties and subcellular localization. Cross-validation revealed an excellent prediction performance. All feature categories contributed to the model. Predicted and experimentally derived HDF showed a good consistency when investigating their common cellular processes and function. Cellular processes and molecular function of these genes were highly enriched in membrane trafficking, particularly in the trans-Golgi network, cell cycle and the Rab GTPase binding family. Using our machine learning approach, we show that HDF in organisms can be predicted with high accuracy evidencing their common investigated characteristics. We elucidated cellular processes which are utilized by invading pathogens during infection. Finally, we provide a list of 208 novel HDF proposed for future experimental studies.
Collapse
Affiliation(s)
- Olufemi Aromolaran
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Thomas Beder
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
| | - Eunice Adedeji
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria
| | - Yvonne Ajamma
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Jelili Oyelade
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Rainer Koenig
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
| |
Collapse
|
5
|
Law JN, Kale SD, Murali TM. Accurate and efficient gene function prediction using a multi-bacterial network. Bioinformatics 2021; 37:800-806. [PMID: 33063084 DOI: 10.1093/bioinformatics/btaa885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 09/23/2020] [Accepted: 09/30/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Nearly 40% of the genes in sequenced genomes have no experimentally or computationally derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally based functional annotations and systematically transfer them to newly sequenced organisms on a genome-wide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. RESULTS We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under 3 min for 200 bacterial species. AVAILABILITY AND IMPLEMENTATION An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jeffrey N Law
- Genetics, Bioinformatics and Computational Biology Ph.D. Program, Blacksburg, VA 24061, USA
| | - Shiv D Kale
- Fralin Life Sciences Institute, Blacksburg, VA 24061, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
6
|
Aromolaran O, Aromolaran D, Isewon I, Oyelade J. Machine learning approach to gene essentiality prediction: a review. Brief Bioinform 2021; 22:6219158. [PMID: 33842944 DOI: 10.1093/bib/bbab128] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/04/2021] [Accepted: 03/17/2021] [Indexed: 12/17/2022] Open
Abstract
Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes' biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. SHORT ABSTRACT Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets' discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.
Collapse
Affiliation(s)
- Olufemi Aromolaran
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Damilare Aromolaran
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| |
Collapse
|
7
|
Abstract
Identification of HIV-1 HDFs remains a crucial step to understand the complicated relationships between human and HIV-1. To complement the experimental identification of HDFs, we have implemented an existing network-based gene discovery strategy to predict HDFs from the human genome. The core idea of the proposed method is that the rich information deposited in host gene functional networks can be effectively utilized to infer the potential HDFs. We hope the proposed prediction method could further guide hypothesis-driven experimental efforts to interrogate human–HIV-1 relationships and provide new hints for the development of antiviral drugs to combat HIV-1 infection. Human immunodeficiency virus type 1 (HIV-1) depends on a class of host proteins called host dependency factors (HDFs) to facilitate its infection. So far experimental efforts have detected a certain number of HDFs, but the gene inventory of HIV-1 HDFs remains incomplete. Here, we implemented an existing network-based gene discovery strategy to predict HIV-1 HDFs. First, an encoding scheme based on a publicly available human tissue-specific gene functional network (GIANT; http://giant.princeton.edu/) was designed to convert each human gene into a 25,825-dimensional feature vector. Then, a random forest-based predictive model was trained on a data set containing 868 known HDFs and 1,736 non-HDFs. Through 5-fold cross-validation, an independent test, and comparison with one existing method, the proposed prediction method consistently revealed accurate and competitive performance. The highlight of our method should be ascribed to the introduction of the GIANT encoding scheme, which contains rich information regarding gene interactions. By merging known HDFs and genome-wide HDF prediction results, network analysis was conducted to catch the common patterns of HDFs in the context of the GIANT network. Interestingly, HDFs reveal significantly lower betweenness than HIV-1-interacting human proteins (i.e., HIV targets). In the meantime, the functional roles of HDFs were also examined by mapping all the HDF candidates into human protein complexes. Especially, we observed the frequent co-occurrence of HDFs and HIV targets at the protein complex level. Collectively, we hope the proposed prediction method not only can accelerate the HDF identification and antiviral drug target discovery, but also can provide some mechanistic insights into human-virus relationships. IMPORTANCE Identification of HIV-1 HDFs remains a crucial step to understand the complicated relationships between human and HIV-1. To complement the experimental identification of HDFs, we have implemented an existing network-based gene discovery strategy to predict HDFs from the human genome. The core idea of the proposed method is that the rich information deposited in host gene functional networks can be effectively utilized to infer the potential HDFs. We hope the proposed prediction method could further guide hypothesis-driven experimental efforts to interrogate human–HIV-1 relationships and provide new hints for the development of antiviral drugs to combat HIV-1 infection.
Collapse
|
8
|
Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. Bioinformatics 2020; 36:3457-3465. [PMID: 32129827 PMCID: PMC7267831 DOI: 10.1093/bioinformatics/btaa150] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 12/22/2022] Open
Abstract
Background Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. Results In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. Availability and implementation The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. Contact arjun@msu.edu Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science and Engineering
| | | | | | - Kayla A Johnson
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
- To whom correspondence should be addressed.
| |
Collapse
|
9
|
How HIV-1 Gag Manipulates Its Host Cell Proteins: A Focus on Interactors of the Nucleocapsid Domain. Viruses 2020; 12:v12080888. [PMID: 32823718 PMCID: PMC7471995 DOI: 10.3390/v12080888] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/06/2020] [Accepted: 08/10/2020] [Indexed: 12/27/2022] Open
Abstract
The human immunodeficiency virus (HIV-1) polyprotein Gag (Group-specific antigen) plays a central role in controlling the late phase of the viral lifecycle. Considered to be only a scaffolding protein for a long time, the structural protein Gag plays determinate and specific roles in HIV-1 replication. Indeed, via its different domains, Gag orchestrates the specific encapsidation of the genomic RNA, drives the formation of the viral particle by its auto-assembly (multimerization), binds multiple viral proteins, and interacts with a large number of cellular proteins that are needed for its functions from its translation location to the plasma membrane, where newly formed virions are released. Here, we review the interactions between HIV-1 Gag and 66 cellular proteins. Notably, we describe the techniques used to evidence these interactions, the different domains of Gag involved, and the implications of these interactions in the HIV-1 replication cycle. In the final part, we focus on the interactions involving the highly conserved nucleocapsid (NC) domain of Gag and detail the functions of the NC interactants along the viral lifecycle.
Collapse
|
10
|
Ivanov S, Lagunin A, Filimonov D, Tarasova O. Network-Based Analysis of OMICs Data to Understand the HIV-Host Interaction. Front Microbiol 2020; 11:1314. [PMID: 32625189 PMCID: PMC7311653 DOI: 10.3389/fmicb.2020.01314] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 05/25/2020] [Indexed: 12/22/2022] Open
Abstract
The interaction of human immunodeficiency virus with human cells is responsible for all stages of the viral life cycle, from the infection of CD4+ cells to reverse transcription, integration, and the assembly of new viral particles. To date, a large amount of OMICs data as well as information from functional genomics screenings regarding the HIV–host interaction has been accumulated in the literature and in public databases. We processed databases containing HIV–host interactions and found 2910 HIV-1-human protein-protein interactions, mostly related to viral group M subtype B, 137 interactions between human and HIV-1 coding and non-coding RNAs, essential for viral lifecycle and cell defense mechanisms, 232 transcriptomics, 27 proteomics, and 34 epigenomics HIV-related experiments. Numerous studies regarding network-based analysis of corresponding OMICs data have been published in recent years. We overview various types of molecular networks, which can be created using OMICs data, including HIV–human protein–protein interaction networks, co-expression networks, gene regulatory and signaling networks, and approaches for the analysis of their topology and dynamics. The network-based analysis can be used to determine the critical pathways and key proteins involved in the HIV life cycle, cellular and immune responses to infection, viral escape from host defense mechanisms, and mechanisms mediating different susceptibility of humans to infection. The proteins and pathways identified in these studies represent a basis for developing new anti-HIV therapeutic strategies such as new drugs preventing infection of CD4+ cells and viral replication, effective vaccines, “shock and kill” and “block and lock” approaches to cure latent infection.
Collapse
Affiliation(s)
- Sergey Ivanov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia.,Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Alexey Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia.,Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Dmitry Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| | - Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| |
Collapse
|
11
|
Liu R, Mancuso CA, Yannakopoulos A, Johnson KA, Krishnan A. Supervised learning is an accurate method for network-based gene classification. BIOINFORMATICS (OXFORD, ENGLAND) 2020; 36:3457-3465. [PMID: 32129827 DOI: 10.1101/721423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/01/2019] [Accepted: 02/27/2020] [Indexed: 05/26/2023]
Abstract
BACKGROUND Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. RESULTS In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT arjun@msu.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renming Liu
- Department of Computational Mathematics, Science and Engineering
| | | | | | - Kayla A Johnson
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
12
|
Kiblawi S, Chasman D, Henning A, Park E, Poon H, Gould M, Ahlquist P, Craven M. Augmenting subnetwork inference with information extracted from the scientific literature. PLoS Comput Biol 2019; 15:e1006758. [PMID: 31246951 PMCID: PMC6619809 DOI: 10.1371/journal.pcbi.1006758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 07/10/2019] [Accepted: 01/04/2019] [Indexed: 11/20/2022] Open
Abstract
Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference There is a multitude of publicly available databases that contain information about biological entities (i.e., genes, proteins, and other small molecules) as well as information about how these entities interact together. However, these databases are often incomplete. There is a wealth of information present in the text of the scientific literature that is not yet available in these databases. Using tools that mine the scientific literature we are able to extract some of this potentially relevant information. In this work we show how we can use publicly available databases in conjunction with the information extracted from the scientific literature to infer the networks that are involved in specific biological processes, such as viral replication and cancer tumor growth.
Collapse
Affiliation(s)
- Sid Kiblawi
- Department of Computer Sciences, University of Wisconsin, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, WI, USA
| | - Amanda Henning
- Department of Oncology, University of Wisconsin, Madison, WI, USA
| | - Eunju Park
- Institute for Molecular Virology, University of Wisconsin, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | | | - Michael Gould
- Department of Oncology, University of Wisconsin, Madison, WI, USA
| | - Paul Ahlquist
- Institute for Molecular Virology, University of Wisconsin, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
- Howard Hughes Medical Institute, University of Wisconsin, Madison, WI, USA
| | - Mark Craven
- Department of Computer Sciences, University of Wisconsin, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
- * E-mail:
| |
Collapse
|
13
|
Bern M, King A, Applewhite DA, Ritz A. Network-based prediction of polygenic disease genes involved in cell motility. BMC Bioinformatics 2019; 20:313. [PMID: 31216978 PMCID: PMC6584515 DOI: 10.1186/s12859-019-2834-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Background Schizophrenia and autism are examples of polygenic diseases caused by a multitude of genetic variants, many of which are still poorly understood. Recently, both diseases have been associated with disrupted neuron motility and migration patterns, suggesting that aberrant cell motility is a phenotype for these neurological diseases. Results We formulate the Polygenic Disease Phenotype Problem which seeks to identify candidate disease genes that may be associated with a phenotype such as cell motility. We present a machine learning approach to solve this problem for schizophrenia and autism genes within a brain-specific functional interaction network. Our method outperforms peer semi-supervised learning approaches, achieving better cross-validation accuracy across different sets of gold-standard positives. We identify top candidates for both schizophrenia and autism, and select six genes labeled as schizophrenia positives that are predicted to be associated with cell motility for follow-up experiments. Conclusions Candidate genes predicted by our method suggest testable hypotheses about these genes’ role in cell motility regulation, offering a framework for generating predictions for experimental validation. Electronic supplementary material The online version of this article (10.1186/s12859-019-2834-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Miriam Bern
- Biology Department, Reed College, Portland, OR, USA
| | | | | | - Anna Ritz
- Biology Department, Reed College, Portland, OR, USA.
| |
Collapse
|
14
|
Ackerman EE, Alcorn JF, Hase T, Shoemaker JE. A dual controllability analysis of influenza virus-host protein-protein interaction networks for antiviral drug target discovery. BMC Bioinformatics 2019; 20:297. [PMID: 31159726 PMCID: PMC6545738 DOI: 10.1186/s12859-019-2917-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 05/28/2019] [Indexed: 01/25/2023] Open
Abstract
Background Host factors of influenza virus replication are often found in key topological positions within protein-protein interaction networks. This work explores how protein states can be manipulated through controllability analysis: the determination of the minimum manipulation needed to drive the cell system to any desired state. Here, we complete a two-part controllability analysis of two protein networks: a host network representing the healthy cell state and an influenza A virus-host network representing the infected cell state. In this context, controllability analyses aim to identify key regulating host factors of the infected cell’s progression. This knowledge can be utilized in further biological analysis to understand disease dynamics and isolate proteins for study as drug target candidates. Results Both topological and controllability analyses provide evidence of wide-reaching network effects stemming from the addition of viral-host protein interactions. Virus interacting and driver host proteins are significant both topologically and in controllability, therefore playing important roles in cell behavior during infection. Functional analysis finds overlap of results with previous siRNA studies of host factors involved in influenza replication, NF-kB pathway and infection relevance, and roles as interferon regulating genes. 24 proteins are identified as holding regulatory roles specific to the infected cell by measures of topology, controllability, and functional role. These proteins are recommended for further study as potential antiviral drug targets. Conclusions Seasonal outbreaks of influenza A virus are a major cause of illness and death around the world each year with a constant threat of pandemic infection. This research aims to increase the efficiency of antiviral drug target discovery using existing protein-protein interaction data and network analysis methods. These results are beneficial to future studies of influenza virus, both experimental and computational, and provide evidence that the combination of topology and controllability analyses may be valuable for future efforts in drug target discovery. Electronic supplementary material The online version of this article (10.1186/s12859-019-2917-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Emily E Ackerman
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - John F Alcorn
- Division of Pulmonary Medicine, Allergy, and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, PA, USA
| | - Takeshi Hase
- The Systems Biology Institute, Saisei Ikedayama Bldg. 5-10-25 Higashi Gotanda, Shinagawa, Tokyo, 141-0022, Japan.,Medical Data Sciences Office, Tokyo Medical and Dental University, M&D Tower 20F, 1-5-45 Yushima, Bunkyo, Tokyo, 113-8510, Japan
| | - Jason E Shoemaker
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, PA, USA. .,The McGowan Institute for Regenerative Medicine (MIRM), University of Pittsburgh, Pittsburgh, PA, USA. .,Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
15
|
Viruses.STRING: A Virus-Host Protein-Protein Interaction Database. Viruses 2018; 10:v10100519. [PMID: 30249048 PMCID: PMC6213343 DOI: 10.3390/v10100519] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 09/19/2018] [Accepted: 09/20/2018] [Indexed: 12/21/2022] Open
Abstract
As viruses continue to pose risks to global health, having a better understanding of virus–host protein–protein interactions aids in the development of treatments and vaccines. Here, we introduce Viruses.STRING, a protein–protein interaction database specifically catering to virus–virus and virus–host interactions. This database combines evidence from experimental and text-mining channels to provide combined probabilities for interactions between viral and host proteins. The database contains 177,425 interactions between 239 viruses and 319 hosts. The database is publicly available at viruses.string-db.org, and the interaction data can also be accessed through the latest version of the Cytoscape STRING app.
Collapse
|
16
|
Goodacre N, Devkota P, Bae E, Wuchty S, Uetz P. Protein-protein interactions of human viruses. Semin Cell Dev Biol 2018; 99:31-39. [PMID: 30031213 PMCID: PMC7102568 DOI: 10.1016/j.semcdb.2018.07.018] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 04/02/2018] [Accepted: 07/17/2018] [Indexed: 12/16/2022]
Abstract
Viruses infect their human hosts by a series of interactions between viral and host proteins, indicating that detailed knowledge of such virus-host interaction interfaces are critical for our understanding of viral infection mechanisms, disease etiology and the development of new drugs. In this review, we primarily survey human host-virus interaction data that are available from public databases following the standardized PSI-MS format. Notably, available host-virus protein interaction information is strongly biased toward a small number of virus families including herpesviridae, papillomaviridae, orthomyxoviridae and retroviridae. While we explore the reliability and relevance of these protein interactions we also survey the current knowledge about viruses functional and topological targets. Furthermore, we assess emerging frontiers of host-virus protein interaction research, focusing on protein interaction interfaces of hosts that are infected by different viruses and viruses that infect multiple hosts. Finally, we cover the current status of research that investigates the relationships of virus-targeted host proteins to other comorbidities as well as the influence of host-virus protein interactions on human metabolism.
Collapse
Affiliation(s)
- Norman Goodacre
- Division of Viral Products, Office of Vaccines Research and Review, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
| | - Prajwal Devkota
- Dept. of Computer Science, Univ. of Miami, Coral Gables, FL, 33146, USA
| | - Eunhae Bae
- Division of Viral Products, Office of Vaccines Research and Review, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
| | - Stefan Wuchty
- Dept. of Computer Science, Univ. of Miami, Coral Gables, FL, 33146, USA; Center for Computational Science, Univ. of Miami, Coral Gables, FL, 33146, USA; Dept. of Biology, Univ. of Miami, Coral Gables, FL, 33146, USA; Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, 33136, USA.
| | - Peter Uetz
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA, 23284, USA.
| |
Collapse
|
17
|
Sertznig H, Hillebrand F, Erkelenz S, Schaal H, Widera M. Behind the scenes of HIV-1 replication: Alternative splicing as the dependency factor on the quiet. Virology 2018; 516:176-188. [PMID: 29407375 DOI: 10.1016/j.virol.2018.01.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 01/10/2018] [Accepted: 01/11/2018] [Indexed: 01/31/2023]
Abstract
Alternative splicing plays a key role in the HIV-1 life cycle and is essential to maintain an equilibrium of mRNAs that encode viral proteins and polyprotein-isoforms. In particular, since all early HIV-1 proteins are expressed from spliced intronless and late enzymatic and structural proteins from intron containing, i.e. splicing repressed viral mRNAs, cellular splicing factors and splicing regulatory proteins are crucial for the replication capacity. In this review, we will describe the complex network of cis-acting splicing regulatory elements (SREs), which are mainly localized in the neighbourhoods of all HIV-1 splice sites and warrant the proper ratio of individual transcript isoforms. Since SREs represent binding sites for trans-acting cellular splicing factors interacting with the cellular spliceosomal apparatus we will review the current knowledge of interactions between viral RNA and cellular proteins as well as their impact on viral replication. Finally, we will discuss potential therapeutic approaches targeting HIV-1 alternative splicing.
Collapse
Affiliation(s)
- Helene Sertznig
- Institute for Virology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Frank Hillebrand
- Institute of Virology, Heinrich Heine University, University Hospital, Düsseldorf, Germany
| | - Steffen Erkelenz
- Institute for Genetics, Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Germany
| | - Heiner Schaal
- Institute of Virology, Heinrich Heine University, University Hospital, Düsseldorf, Germany
| | - Marek Widera
- Institute for Virology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.
| |
Collapse
|
18
|
Devkota P, Danzi MC, Wuchty S. Beyond degree and betweenness centrality: Alternative topological measures to predict viral targets. PLoS One 2018; 13:e0197595. [PMID: 29795705 PMCID: PMC5967884 DOI: 10.1371/journal.pone.0197595] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 05/04/2018] [Indexed: 11/18/2022] Open
Abstract
The availability of large-scale screens of host-virus interaction interfaces enabled the topological analysis of viral protein targets of the host. In particular, host proteins that bind viral proteins are generally hubs and proteins with high betweenness centrality. Recently, other topological measures were introduced that a virus may tap to infect a host cell. Utilizing experimentally determined sets of human protein targets from Herpes, Hepatitis, HIV and Influenza, we pooled molecular interactions between proteins from different pathway databases. Apart from a protein's degree and betweenness centrality, we considered a protein's pathway participation, ability to topologically control a network and protein PageRank index. In particular, we found that proteins with increasing values of such measures tend to accumulate viral targets and distinguish viral targets from non-targets. Furthermore, all such topological measures strongly correlate with the occurrence of a given protein in different pathways. Building a random forest classifier that is based on such topological measures, we found that protein PageRank index had the highest impact on the classification of viral (non-)targets while proteins' ability to topologically control an interaction network played the least important role.
Collapse
Affiliation(s)
- Prajwal Devkota
- Dept. of Computer Science, Univ. of Miami, Coral Gables, FL, United States of America
| | - Matt C. Danzi
- The Miami Project to Cure Paralysis, Miller School of Medicine, University of Miami, Miami, FL, United States of America
- Center for Computational Science, Univ. of Miami, Coral Gables, FL, United States of America
| | - Stefan Wuchty
- Dept. of Computer Science, Univ. of Miami, Coral Gables, FL, United States of America
- Center for Computational Science, Univ. of Miami, Coral Gables, FL, United States of America
- Dept. of Biology, Univ. of Miami, Coral Gables, FL, United States of America
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States of America
- * E-mail:
| |
Collapse
|
19
|
Abstract
Since cell regulation and protein expression can be dramatically altered upon infection by viruses, studying the mechanisms by which viruses infect cells and the regulatory networks they disrupt is essential to understanding viral pathogenicity. This line of study can also lead to discoveries about the workings of host cells themselves. Computational methods are rapidly being developed to investigate viral-host interactions, and here we highlight recent methods and the insights that they have revealed so far, with a particular focus on methods that integrate different types of data. We also review the challenges of working with viruses compared with traditional cellular biology, and the limitations of current experimental and informatics methods.
Collapse
|
20
|
Rioualen C, Da Costa Q, Chetrit B, Charafe-Jauffret E, Ginestier C, Bidaut G. HTS-Net: An integrated regulome-interactome approach for establishing network regulation models in high-throughput screenings. PLoS One 2017; 12:e0185400. [PMID: 28949986 PMCID: PMC5614607 DOI: 10.1371/journal.pone.0185400] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 09/12/2017] [Indexed: 12/28/2022] Open
Abstract
High-throughput RNAi screenings (HTS) allow quantifying the impact of the deletion of each gene in any particular function, from virus-host interactions to cell differentiation. However, there has been less development for functional analysis tools dedicated to RNAi analyses. HTS-Net, a network-based analysis program, was developed to identify gene regulatory modules impacted in high-throughput screenings, by integrating transcription factors-target genes interaction data (regulome) and protein-protein interaction networks (interactome) on top of screening z-scores. HTS-Net produces exhaustive HTML reports for results navigation and exploration. HTS-Net is a new pipeline for RNA interference screening analyses that proves better performance than simple gene rankings by z-scores, by re-prioritizing genes and replacing them in their biological context, as shown by the three studies that we reanalyzed. Formatted input data for the three studied datasets, source code and web site for testing the system are available from the companion web site at http://htsnet.marseille.inserm.fr/. We also compared our program with existing algorithms (CARD and hotnet2).
Collapse
Affiliation(s)
- Claire Rioualen
- Aix-Marseille Univ, Marseille, France
- Inserm, U1068, Centre de Recherche en Cancérologie de Marseille, Marseille, France
- Institut Paoli-Calmettes, Marseille, France
- CNRS, UMR7258, Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Quentin Da Costa
- Aix-Marseille Univ, Marseille, France
- Inserm, U1068, Centre de Recherche en Cancérologie de Marseille, Marseille, France
- Institut Paoli-Calmettes, Marseille, France
- CNRS, UMR7258, Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Bernard Chetrit
- Aix-Marseille Univ, Marseille, France
- Inserm, U1068, Centre de Recherche en Cancérologie de Marseille, Marseille, France
- Institut Paoli-Calmettes, Marseille, France
- CNRS, UMR7258, Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Emmanuelle Charafe-Jauffret
- Aix-Marseille Univ, Marseille, France
- Inserm, U1068, Centre de Recherche en Cancérologie de Marseille, Marseille, France
- Institut Paoli-Calmettes, Marseille, France
- CNRS, UMR7258, Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Christophe Ginestier
- Aix-Marseille Univ, Marseille, France
- Inserm, U1068, Centre de Recherche en Cancérologie de Marseille, Marseille, France
- Institut Paoli-Calmettes, Marseille, France
- CNRS, UMR7258, Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Ghislain Bidaut
- Aix-Marseille Univ, Marseille, France
- Inserm, U1068, Centre de Recherche en Cancérologie de Marseille, Marseille, France
- Institut Paoli-Calmettes, Marseille, France
- CNRS, UMR7258, Centre de Recherche en Cancérologie de Marseille, Marseille, France
- * E-mail:
| |
Collapse
|
21
|
Durmuş S, Ülgen KÖ. Comparative interactomics for virus-human protein-protein interactions: DNA viruses versus RNA viruses. FEBS Open Bio 2017; 7:96-107. [PMID: 28097092 PMCID: PMC5221455 DOI: 10.1002/2211-5463.12167] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 11/06/2016] [Accepted: 11/16/2016] [Indexed: 01/01/2023] Open
Abstract
Viruses are obligatory intracellular pathogens and completely depend on their hosts for survival and reproduction. The strategies adopted by viruses to exploit host cell processes and to evade host immune systems during infections may differ largely with the type of the viral genetic material. An improved understanding of these viral infection mechanisms is only possible through a better understanding of the pathogen-host interactions (PHIs) that enable viruses to enter into the host cells and manipulate the cellular mechanisms to their own advantage. Experimentally-verified protein-protein interaction (PPI) data of pathogen-host systems only became available at large scale within the last decade. In this study, we comparatively analyzed the current PHI networks belonging to DNA and RNA viruses and their human host, to get insights into the infection strategies used by these viral groups. We investigated the functional properties of human proteins in the PHI networks, to observe and compare the attack strategies of DNA and RNA viruses. We observed that DNA viruses are able to attack both human cellular and metabolic processes simultaneously during infections. On the other hand, RNA viruses preferentially interact with human proteins functioning in specific cellular processes as well as in intracellular transport and localization within the cell. Observing virus-targeted human proteins, we propose heterogeneous nuclear ribonucleoproteins and transporter proteins as potential antiviral therapeutic targets. The observed common and specific infection mechanisms in terms of viral strategies to attack human proteins may provide crucial information for further design of broad and specific next-generation antiviral therapeutics.
Collapse
Affiliation(s)
- Saliha Durmuş
- Computational Systems Biology GroupDepartment of BioengineeringGebze Technical UniversityKocaeliTurkey
| | - Kutlu Ö. Ülgen
- Department of Chemical EngineeringBoğaziçi UniversityİstanbulTurkey
| |
Collapse
|
22
|
Wani SA, Sahu AR, Saxena S, Hussain S, Pandey A, Kanchan S, Sahoo AP, Mishra B, Tiwari AK, Mishra BP, Gandham RK, Singh RK. Systems biology approach: Panacea for unravelling host-virus interactions and dynamics of vaccine induced immune response. GENE REPORTS 2016; 5:23-29. [PMID: 32289096 PMCID: PMC7104209 DOI: 10.1016/j.genrep.2016.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 06/24/2016] [Accepted: 08/01/2016] [Indexed: 12/18/2022]
Abstract
Systems biology is an interdisciplinary research field in life sciences, which involves a comprehensive and quantitative analysis of the interactions between all of the components of biological systems over time. For the past 50 years the discipline of virology has overly focused on the pathogen itself. However, we now know that the host response is equally or more important in defining the eventual pathological outcome of infection. Systems biology has in recent years been increasingly recognised for its importance to infectious disease research. Host-virus interactions can be better understood by taking into account the dynamical molecular networks that constitute a biological system. To decipher the pathobiological mechanisms of any disease requires a deep knowledge of how multiple and concurrent signal-transduction pathways operate and are deregulated. Hence the intricacies of signalling pathways can be dissected only by system level approaches. Deciphering the host virus interactions through system biology approach reviewed High throughput techniques to understand the host pathogen interactions examined Shift from virus-centric perspective to spectrum of virus-host interactions Modeling of host-virus cross talk
Collapse
Affiliation(s)
- Sajad Ahmad Wani
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Amit Ranjan Sahu
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Shikha Saxena
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Shahid Hussain
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Aruna Pandey
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Sonam Kanchan
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Aditya Prasad Sahoo
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Bina Mishra
- Division of Biological Products, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Ashok Kumar Tiwari
- Division of Biological Standardization, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Bishnu Prasad Mishra
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Ravi Kumar Gandham
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| | - Raj Kumar Singh
- Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute, Izatnagar 243122, India
| |
Collapse
|
23
|
The clinical applications of genome editing in HIV. Blood 2016; 127:2546-52. [PMID: 27053530 DOI: 10.1182/blood-2016-01-678144] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 02/09/2016] [Indexed: 12/13/2022] Open
Abstract
HIV/AIDS has long been at the forefront of the development of gene- and cell-based therapies. Although conventional gene therapy approaches typically involve the addition of anti-HIV genes to cells using semirandomly integrating viral vectors, newer genome editing technologies based on engineered nucleases are now allowing more precise genetic manipulations. The possible outcomes of genome editing include gene disruption, which has been most notably applied to the CCR5 coreceptor gene, or the introduction of small mutations or larger whole gene cassette insertions at a targeted locus. Disruption of CCR5 using zinc finger nucleases was the first-in-human application of genome editing and remains the most clinically advanced platform, with 7 completed or ongoing clinical trials in T cells and hematopoietic stem/progenitor cells (HSPCs). Here we review the laboratory and clinical findings of CCR5 editing in T cells and HSPCs for HIV therapy and summarize other promising genome editing approaches for future clinical development. In particular, recent advances in the delivery of genome editing reagents and the demonstration of highly efficient homology-directed editing in both T cells and HSPCs are expected to spur the development of even more sophisticated applications of this technology for HIV therapy.
Collapse
|
24
|
Dohrmann J, Puchin J, Singh R. Global multiple protein-protein interaction network alignment by combining pairwise network alignments. BMC Bioinformatics 2015; 16 Suppl 13:S11. [PMID: 26423128 PMCID: PMC4597059 DOI: 10.1186/1471-2105-16-s13-s11] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND A wealth of protein interaction data has become available in recent years, creating an urgent need for powerful analysis techniques. In this context, the problem of finding biologically meaningful correspondences between different protein-protein interaction networks (PPIN) is of particular interest. The PPIN of a species can be compared with that of other species through the process of PPIN alignment. Such an alignment can provide insight into basic problems like species evolution and network component function determination, as well as translational problems such as target identification and elucidation of mechanisms of disease spread. Furthermore, multiple PPINs can be aligned simultaneously, expanding the analytical implications of the result. While there are several pairwise network alignment algorithms, few methods are capable of multiple network alignment. RESULTS We propose SMAL, a MNA algorithm based on the philosophy of scaffold-based alignment. SMAL is capable of converting results from any global pairwise alignment algorithms into a MNA in linear time. Using this method, we have built multiple network alignments based on combining pairwise alignments from a number of publicly available (pairwise) network aligners. We tested SMAL using PPINs of eight species derived from the IntAct repository and employed a number of measures to evaluate performance. Additionally, as part of our experimental investigations, we compared the effectiveness of SMAL while aligning up to eight input PPINs, and examined the effect of scaffold network choice on the alignments. CONCLUSIONS A key advantage of SMAL lies in its ability to create MNAs through the use of pairwise network aligners for which native MNA implementations do not exist. Experiments indicate that the performance of SMAL was comparable to that of the native MNA implementation of established methods such as IsoRankN and SMETANA. However, in terms of computational time, SMAL was significantly faster. SMAL was also able to retain many important characteristics of the native pairwise alignments, such as the number of aligned nodes and edges, as well as the functional and homologene similarity of aligned nodes. The speed, flexibility and the ability to retain prior correspondences as new networks are aligned, makes SMAL a compelling choice for alignment of multiple large networks.
Collapse
|
25
|
Durmuş S, Çakır T, Özgür A, Guthke R. A review on computational systems biology of pathogen-host interactions. Front Microbiol 2015; 6:235. [PMID: 25914674 PMCID: PMC4391036 DOI: 10.3389/fmicb.2015.00235] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 03/10/2015] [Indexed: 12/27/2022] Open
Abstract
Pathogens manipulate the cellular mechanisms of host organisms via pathogen-host interactions (PHIs) in order to take advantage of the capabilities of host cells, leading to infections. The crucial role of these interspecies molecular interactions in initiating and sustaining infections necessitates a thorough understanding of the corresponding mechanisms. Unlike the traditional approach of considering the host or pathogen separately, a systems-level approach, considering the PHI system as a whole is indispensable to elucidate the mechanisms of infection. Following the technological advances in the post-genomic era, PHI data have been produced in large-scale within the last decade. Systems biology-based methods for the inference and analysis of PHI regulatory, metabolic, and protein-protein networks to shed light on infection mechanisms are gaining increasing demand thanks to the availability of omics data. The knowledge derived from the PHIs may largely contribute to the identification of new and more efficient therapeutics to prevent or cure infections. There are recent efforts for the detailed documentation of these experimentally verified PHI data through Web-based databases. Despite these advances in data archiving, there are still large amounts of PHI data in the biomedical literature yet to be discovered, and novel text mining methods are in development to unearth such hidden data. Here, we review a collection of recent studies on computational systems biology of PHIs with a special focus on the methods for the inference and analysis of PHI networks, covering also the Web-based databases and text-mining efforts to unravel the data hidden in the literature.
Collapse
Affiliation(s)
- Saliha Durmuş
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, KocaeliTurkey
| | - Tunahan Çakır
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, KocaeliTurkey
| | - Arzucan Özgür
- Department of Computer Engineering, Boǧaziçi University, IstanbulTurkey
| | - Reinhard Guthke
- Leibniz Institute for Natural Product Research and Infection Biology – Hans-Knoell-Institute, JenaGermany
| |
Collapse
|
26
|
Amberkar SS, Kaderali L. An integrative approach for a network based meta-analysis of viral RNAi screens. Algorithms Mol Biol 2015; 10:6. [PMID: 25691914 PMCID: PMC4331137 DOI: 10.1186/s13015-015-0035-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 01/27/2015] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Big data is becoming ubiquitous in biology, and poses significant challenges in data analysis and interpretation. RNAi screening has become a workhorse of functional genomics, and has been applied, for example, to identify host factors involved in infection for a panel of different viruses. However, the analysis of data resulting from such screens is difficult, with often low overlap between hit lists, even when comparing screens targeting the same virus. This makes it a major challenge to select interesting candidates for further detailed, mechanistic experimental characterization. RESULTS To address this problem we propose an integrative bioinformatics pipeline that allows for a network based meta-analysis of viral high-throughput RNAi screens. Initially, we collate a human protein interaction network from various public repositories, which is then subjected to unsupervised clustering to determine functional modules. Modules that are significantly enriched with host dependency factors (HDFs) and/or host restriction factors (HRFs) are then filtered based on network topology and semantic similarity measures. Modules passing all these criteria are finally interpreted for their biological significance using enrichment analysis, and interesting candidate genes can be selected from the modules. CONCLUSIONS We apply our approach to seven screens targeting three different viruses, and compare results with other published meta-analyses of viral RNAi screens. We recover key hit genes, and identify additional candidates from the screens. While we demonstrate the application of the approach using viral RNAi data, the method is generally applicable to identify underlying mechanisms from hit lists derived from high-throughput experimental data, and to select a small number of most promising genes for further mechanistic studies.
Collapse
|
27
|
Bandyopadhyay S, Ray S, Mukhopadhyay A, Maulik U. A review of in silico approaches for analysis and prediction of HIV-1-human protein-protein interactions. Brief Bioinform 2014; 16:830-51. [PMID: 25479794 DOI: 10.1093/bib/bbu041] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2014] [Indexed: 12/19/2022] Open
Abstract
The computational or in silico approaches for analysing the HIV-1-human protein-protein interaction (PPI) network, predicting different host cellular factors and PPIs and discovering several pathways are gaining popularity in the field of HIV research. Although there exist quite a few studies in this regard, no previous effort has been made to review these works in a comprehensive manner. Here we review the computational approaches that are devoted to the analysis and prediction of HIV-1-human PPIs. We have broadly categorized these studies into two fields: computational analysis of HIV-1-human PPI network and prediction of novel PPIs. We have also presented a comparative assessment of these studies and proposed some methodologies for discussing the implication of their results. We have also reviewed different computational techniques for predicting HIV-1-human PPIs and provided a comparative study of their applicability. We believe that our effort will provide helpful insights to the HIV research community.
Collapse
|
28
|
Emig-Agius D, Olivieri K, Pache L, Shih HL, Pustovalova O, Bessarabova M, Young JAT, Chanda SK, Ideker T. An integrated map of HIV-human protein complexes that facilitate viral infection. PLoS One 2014; 9:e96687. [PMID: 24817247 PMCID: PMC4016004 DOI: 10.1371/journal.pone.0096687] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 04/11/2014] [Indexed: 12/03/2022] Open
Abstract
Recent proteomic and genetic studies have aimed to identify a complete network of interactions between HIV and human proteins and genes. This HIV-human interaction network provides invaluable information as to how HIV exploits the host machinery and can be used as a starting point for further functional analyses. We integrated this network with complementary datasets of protein function and interaction to nominate human protein complexes with likely roles in viral infection. Based on our approach we identified a global map of 40 HIV-human protein complexes with putative roles in HIV infection, some of which are involved in DNA replication and repair, transcription, translation, and cytoskeletal regulation. Targeted RNAi screens were used to validate several proteins and complexes for functional impact on viral infection. Thus, our HIV-human protein complex map provides a significant resource of potential HIV-host interactions for further study.
Collapse
Affiliation(s)
- Dorothea Emig-Agius
- Departments of Medicine and Bioengineering, University of California at San Diego, La Jolla, California, United States of America
- IP&Science, Thomson Reuters Scientific Inc., Carlsbad, California, United States of America
| | - Kevin Olivieri
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Lars Pache
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Hsin Ling Shih
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Olga Pustovalova
- IP&Science, Thomson Reuters Scientific Inc., Carlsbad, California, United States of America
| | - Marina Bessarabova
- IP&Science, Thomson Reuters Scientific Inc., Carlsbad, California, United States of America
| | - John A. T. Young
- The Salk Institute for Biological Studies, La Jolla, California, United States of America
| | - Sumit K. Chanda
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Trey Ideker
- Departments of Medicine and Bioengineering, University of California at San Diego, La Jolla, California, United States of America
| |
Collapse
|
29
|
Chasman D, Gancarz B, Hao L, Ferris M, Ahlquist P, Craven M. Inferring host gene subnetworks involved in viral replication. PLoS Comput Biol 2014; 10:e1003626. [PMID: 24874113 PMCID: PMC4038467 DOI: 10.1371/journal.pcbi.1003626] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 02/06/2014] [Indexed: 12/16/2022] Open
Abstract
Systematic, genome-wide loss-of-function experiments can be used to identify host factors that directly or indirectly facilitate or inhibit the replication of a virus in a host cell. We present an approach that combines an integer linear program and a diffusion kernel method to infer the pathways through which those host factors modulate viral replication. The inputs to the method are a set of viral phenotypes observed in single-host-gene mutants and a background network consisting of a variety of host intracellular interactions. The output is an ensemble of subnetworks that provides a consistent explanation for the measured phenotypes, predicts which unassayed host factors modulate the virus, and predicts which host factors are the most direct interfaces with the virus. We infer host-virus interaction subnetworks using data from experiments screening the yeast genome for genes modulating the replication of two RNA viruses. Because a gold-standard network is unavailable, we assess the predicted subnetworks using both computational and qualitative analyses. We conduct a cross-validation experiment in which we predict whether held-aside test genes have an effect on viral replication. Our approach is able to make high-confidence predictions more accurately than several baselines, and about as well as the best baseline, which does not infer mechanistic pathways. We also examine two kinds of predictions made by our method: which host factors are nearest to a direct interaction with a viral component, and which unassayed host genes are likely to be involved in viral replication. Multiple predictions are supported by recent independent experimental data, or are components or functional partners of confirmed relevant complexes or pathways. Integer program code, background network data, and inferred host-virus subnetworks are available at http://www.biostat.wisc.edu/~craven/chasman_host_virus/.
Collapse
Affiliation(s)
- Deborah Chasman
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Brandi Gancarz
- Luminex Corporation, Madison, Wisconsin, United States of America
- Institute for Molecular Virology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Linhui Hao
- Institute for Molecular Virology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Howard Hughes Medical Institute, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Michael Ferris
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Paul Ahlquist
- Institute for Molecular Virology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Howard Hughes Medical Institute, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Morgridge Institute for Research, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Mark Craven
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
30
|
Poirel CL, Rodrigues RR, Chen KC, Tyson JJ, Murali TM. Top-down network analysis to drive bottom-up modeling of physiological processes. J Comput Biol 2013; 20:409-18. [PMID: 23641868 DOI: 10.1089/cmb.2012.0274] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Top-down analyses in systems biology can automatically find correlations among genes and proteins in large-scale datasets. However, it is often difficult to design experiments from these results. In contrast, bottom-up approaches painstakingly craft detailed models that can be simulated computationally to suggest wet lab experiments. However, developing the models is a manual process that can take many years. These approaches have largely been developed independently. We present LINKER, an efficient and automated data-driven method that can analyze molecular interactomes to propose extensions to models that can be simulated. LINKER combines teleporting random walks and k-shortest path computations to discover connections from a source protein to a set of proteins collectively involved in a particular cellular process. We evaluate the efficacy of LINKER by applying it to a well-known dynamic model of the cell division cycle in Saccharomyces cerevisiae. Compared to other state-of-the-art methods, subnetworks computed by LINKER are heavily enriched in Gene Ontology (GO) terms relevant to the cell cycle. Finally, we highlight how networks computed by LINKER elucidate the role of a protein kinase (Cdc5) in the mitotic exit network of a dynamic model of the cell cycle.
Collapse
|
31
|
Coiras M, Montes M, Montanuy I, López-Huertas MR, Mateos E, Le Sommer C, Garcia-Blanco MA, Hernández-Munain C, Alcamí J, Suñé C. Transcription elongation regulator 1 (TCERG1) regulates competent RNA polymerase II-mediated elongation of HIV-1 transcription and facilitates efficient viral replication. Retrovirology 2013; 10:124. [PMID: 24165037 PMCID: PMC3874760 DOI: 10.1186/1742-4690-10-124] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 10/18/2013] [Indexed: 12/30/2022] Open
Abstract
Background Control of RNA polymerase II (RNAPII) release from pausing has been proposed as a checkpoint mechanism to ensure optimal RNAPII activity, especially in large, highly regulated genes. HIV-1 gene expression is highly regulated at the level of elongation, which includes transcriptional pausing that is mediated by both viral and cellular factors. Here, we present evidence for a specific role of the elongation-related factor TCERG1 in regulating the extent of HIV-1 elongation and viral replication in vivo. Results We show that TCERG1 depletion diminishes the basal and viral Tat-activated transcription from the HIV-1 LTR. In support of a role for an elongation mechanism in the transcriptional control of HIV-1, we found that TCERG1 modifies the levels of pre-mRNAs generated at distal regions of HIV-1. Most importantly, TCERG1 directly affects the elongation rate of RNAPII transcription in vivo. Furthermore, our data demonstrate that TCERG1 regulates HIV-1 transcription by increasing the rate of RNAPII elongation through the phosphorylation of serine 2 within the carboxyl-terminal domain (CTD) of RNAPII and suggest a mechanism for the involvement of TCERG1 in relieving pausing. Finally, we show that TCERG1 is required for HIV-1 replication. Conclusions Our study reveals that TCERG1 regulates HIV-1 transcriptional elongation by increasing the elongation rate of RNAPII and phosphorylation of Ser 2 within the CTD. Based on our data, we propose a general mechanism for TCERG1 acting on genes that are regulated at the level of elongation by increasing the rate of RNAPII transcription through the phosphorylation of Ser2. In the case of HIV-1, our evidence provides the basis for further investigation of TCERG1 as a potential therapeutic target for the inhibition of HIV-1 replication
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Carlos Suñé
- Department of Molecular Biology, Instituto de Parasitología y Biomedicina "López Neyra" (IPBLN-CSIC), Armilla, Granada 18016, Spain.
| |
Collapse
|
32
|
Hao L, He Q, Wang Z, Craven M, Newton MA, Ahlquist P. Limited agreement of independent RNAi screens for virus-required host genes owes more to false-negative than false-positive factors. PLoS Comput Biol 2013; 9:e1003235. [PMID: 24068911 PMCID: PMC3777922 DOI: 10.1371/journal.pcbi.1003235] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 08/07/2013] [Indexed: 11/19/2022] Open
Abstract
Systematic, genome-wide RNA interference (RNAi) analysis is a powerful approach to identify gene functions that support or modulate selected biological processes. An emerging challenge shared with some other genome-wide approaches is that independent RNAi studies often show limited agreement in their lists of implicated genes. To better understand this, we analyzed four genome-wide RNAi studies that identified host genes involved in influenza virus replication. These studies collectively identified and validated the roles of 614 cell genes, but pair-wise overlap among the four gene lists was only 3% to 15% (average 6.7%). However, a number of functional categories were overrepresented in multiple studies. The pair-wise overlap of these enriched-category lists was high, ∼19%, implying more agreement among studies than apparent at the gene level. Probing this further, we found that the gene lists implicated by independent studies were highly connected in interacting networks by independent functional measures such as protein-protein interactions, at rates significantly higher than predicted by chance. We also developed a general, model-based approach to gauge the effects of false-positive and false-negative factors and to estimate, from a limited number of studies, the total number of genes involved in a process. For influenza virus replication, this novel statistical approach estimates the total number of cell genes involved to be ∼2,800. This and multiple other aspects of our experimental and computational results imply that, when following good quality control practices, the low overlap between studies is primarily due to false negatives rather than false-positive gene identifications. These results and methods have implications for and applications to multiple forms of genome-wide analysis. Genome-wide RNA interference assays of gene functions offer the potential for systematic, global analysis of biological processes. A pressing challenge is to develop meta-analysis methods that effectively combine information from multiple studies. One puzzle is that implicated gene lists from independent studies of the same process often show relatively low overlap. This disagreement might arise from false-positive factors, such as imperfect gene targeting (off-target effects), or from false negatives if separate studies access different components of large, complex systems. We present new methods to examine the relations between individual genome-wide RNAi studies, using studies of host genes in influenza virus replication as a test case. We find that cross-study agreement is greater than suggested by overlap of reported gene lists. This better agreement is evidenced by the strong relation of independent gene lists in functional pathways and protein interaction networks, and by a statistical model that relates multi-study, gene-level findings to factors driving correct, false-negative, and false-positive gene identification. Our analysis of multiple genome-wide studies predicts that there are many undetected host genes important for influenza virus infection, and that false negatives are the major concerns for genome-wide studies.
Collapse
Affiliation(s)
- Linhui Hao
- Institute of Molecular Virology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Howard Hughes Medical Institute, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Qiuling He
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Zhishi Wang
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Mark Craven
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Michael A. Newton
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail: (MAN); (PA)
| | - Paul Ahlquist
- Institute of Molecular Virology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Howard Hughes Medical Institute, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
- * E-mail: (MAN); (PA)
| |
Collapse
|
33
|
Amberkar S, Kiani NA, Bartenschlager R, Alvisi G, Kaderali L. High-throughput RNA interference screens integrative analysis: Towards a comprehensive understanding of the virus-host interplay. World J Virol 2013; 2:18-31. [PMID: 24175227 PMCID: PMC3785050 DOI: 10.5501/wjv.v2.i2.18] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/05/2012] [Revised: 02/15/2013] [Accepted: 03/15/2013] [Indexed: 02/05/2023] Open
Abstract
Viruses are extremely heterogeneous entities; the size and the nature of their genetic information, as well as the strategies employed to amplify and propagate their genomes, are highly variable. However, as obligatory intracellular parasites, replication of all viruses relies on the host cell. Having co-evolved with their host for several million years, viruses have developed very sophisticated strategies to hijack cellular factors that promote virus uptake, replication, and spread. Identification of host cell factors (HCFs) required for these processes is a major challenge for researchers, but it enables the identification of new, highly selective targets for anti viral therapeutics. To this end, the establishment of platforms enabling genome-wide high-throughput RNA interference (HT-RNAi) screens has led to the identification of several key factors involved in the viral life cycle. A number of genome-wide HT-RNAi screens have been performed for major human pathogens. These studies enable first inter-viral comparisons related to HCF requirements. Although several cellular functions appear to be uniformly required for the life cycle of most viruses tested (such as the proteasome and the Golgi-mediated secretory pathways), some factors, like the lipid kinase Phosphatidylinositol 4-kinase IIIα in the case of hepatitis C virus, are selectively required for individual viruses. However, despite the amount of data available, we are still far away from a comprehensive understanding of the interplay between viruses and host factors. Major limitations towards this goal are the low sensitivity and specificity of such screens, resulting in limited overlap between different screens performed with the same virus. This review focuses on how statistical and bioinformatic analysis methods applied to HT-RNAi screens can help overcoming these issues thus increasing the reliability and impact of such studies.
Collapse
|
34
|
Durmuş Tekir SD, Ülgen KÖ. Systems biology of pathogen-host interaction: networks of protein-protein interaction within pathogens and pathogen-human interactions in the post-genomic era. Biotechnol J 2013; 8:85-96. [PMID: 23193100 PMCID: PMC7161785 DOI: 10.1002/biot.201200110] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Revised: 09/17/2012] [Accepted: 10/11/2012] [Indexed: 12/13/2022]
Abstract
Infectious diseases comprise some of the leading causes of death and disability worldwide. Interactions between pathogen and host proteins underlie the process of infection. Improved understanding of pathogen-host molecular interactions will increase our knowledge of the mechanisms involved in infection, and allow novel therapeutic solutions to be devised. Complete genome sequences for a number of pathogenic microorganisms, as well as the human host, has led to the revelation of their protein-protein interaction (PPI) networks. In this post-genomic era, pathogen-host interactions (PHIs) operating during infection can also be mapped. Detailed systematic analyses of PPI and PHI data together are required for a complete understanding of pathogenesis of infections. Here we review the striking results recently obtained during the construction and investigation of these networks. Emphasis is placed on studies producing large-scale interaction data by high-throughput experimental techniques.
Collapse
Affiliation(s)
| | - Kutlu Ö. Ülgen
- Department of Chemical Engineering, Boǧaziçi University, Istanbul, Turkey
| |
Collapse
|
35
|
Abstract
High-throughput methods for screening of physical and functional interactions now provide the means to study virus-host interactions on a genome scale. The limited coverage of these methods and the large size and uncertain quality of the identified interaction sets, however, require sophisticated computational approaches to obtain novel insights and hypotheses on virus infection processes from these interactions. Here, we describe the central steps of bioinformatics methods applied most commonly for this task and highlight important aspects that need to be considered and potential pitfalls that should be avoided.
Collapse
Affiliation(s)
- Susanne M. Bailer
- University of Stuttgart Institute of Interfacial Process, Stuttgart, Germany
| | - Diana Lieber
- Ulm University Medical Center Institute of Virology, Ulm, Germany
| |
Collapse
|
36
|
Franzosa EA, Garamszegi S, Xia Y. Toward a three-dimensional view of protein networks between species. Front Microbiol 2012; 3:428. [PMID: 23267356 PMCID: PMC3528071 DOI: 10.3389/fmicb.2012.00428] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2012] [Accepted: 12/06/2012] [Indexed: 01/27/2023] Open
Abstract
General principles governing biomolecular interactions between species are expected to differ significantly from known principles governing the interactions within species, yet these principles remain poorly understood at the systems level. A key reason for this knowledge gap is the lack of a detailed three-dimensional (3D), atomistic view of biomolecular interaction networks between species. Recent progress in structural biology, systems biology, and computational biology has enabled accurate and large-scale construction of 3D structural models of nodes and edges for protein–protein interaction networks within and between species. The resulting within- and between-species structural interaction networks have provided new biophysical, functional, and evolutionary insights into species interactions and infectious disease. Here, we review the nascent field of between-species structural systems biology, focusing on interactions between host and pathogens such as viruses.
Collapse
|
37
|
Wright DW, Wan S, Shublaq N, Zasada SJ, Coveney PV. From base pair to bedside: molecular simulation and the translation of genomics to personalized medicine. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:585-98. [PMID: 22899636 DOI: 10.1002/wsbm.1186] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Despite the promises made that genomic sequencing would transform therapy by introducing a new era of personalized medicine, relatively few tangible breakthroughs have been made. This has led to the recognition that complex interactions at multiple spatial, temporal, and organizational levels may often combine to produce disease. Understanding this complexity requires that existing and future models are used and interpreted within a framework that incorporates knowledge derived from investigations at multiple levels of biological function. It also requires a computational infrastructure capable of dealing with the vast quantities of data generated by genomic approaches. In this review, we discuss the use of molecular modeling to generate quantitative and qualitative insights at the smallest scales of the systems biology hierarchy, how it can play an important role in the development of a systems understanding of disease and in the application of such knowledge to help discover new therapies and target existing ones on a personal level.
Collapse
Affiliation(s)
- David W Wright
- Centre for Computational Science, University College London, London, UK
| | | | | | | | | |
Collapse
|
38
|
Cellular cofactors of lentiviral integrase: from target validation to drug discovery. Mol Biol Int 2012; 2012:863405. [PMID: 22928108 PMCID: PMC3420096 DOI: 10.1155/2012/863405] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Revised: 06/03/2012] [Accepted: 06/27/2012] [Indexed: 01/30/2023] Open
Abstract
To accomplish their life cycle, lentiviruses make use of host proteins, the so-called cellular cofactors. Interactions between host cell and viral proteins during early stages of lentiviral infection provide attractive new antiviral targets. The insertion of lentiviral cDNA in a host cell chromosome is a step of no return in the replication cycle, after which the host cell becomes a permanent carrier of the viral genome and a producer of lentiviral progeny. Integration is carried out by integrase (IN), an enzyme playing also an important role during nuclear import. Plenty of cellular cofactors of HIV-1 IN have been proposed. To date, the lens epithelium-derived growth factor (LEDGF/p75) is the best studied cofactor of HIV-1 IN. Moreover, small molecules that block the LEDGF/p75-IN interaction have recently been developed for the treatment of HIV infection. The nuclear import factor transportin-SR2 (TRN-SR2) has been proposed as another interactor of HIV IN-mediating nuclear import of the virus. Using both proteins as examples, we will describe approaches to be taken to identify and validate novel cofactors as new antiviral targets. Finally, we will highlight recent advances in the design and the development of small-molecule inhibitors binding to the LEDGF/p75-binding pocket in IN (LEDGINs).
Collapse
|
39
|
The Continuing Evolution of HIV-1 Therapy: Identification and Development of Novel Antiretroviral Agents Targeting Viral and Cellular Targets. Mol Biol Int 2012; 2012:401965. [PMID: 22848825 PMCID: PMC3400388 DOI: 10.1155/2012/401965] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2012] [Revised: 04/24/2012] [Accepted: 05/11/2012] [Indexed: 11/18/2022] Open
Abstract
During the past three decades, over thirty-five anti-HIV-1 therapies have been developed for use in humans and the progression from monotherapeutic treatment regimens to today's highly active combination antiretroviral therapies has had a dramatic impact on disease progression in HIV-1-infected individuals. In spite of the success of AIDS therapies and the existence of inhibitors of HIV-1 reverse transcriptase, protease, entry and fusion, and integrase, HIV-1 therapies still have a variety of problems which require continued development efforts to improve efficacy and reduce toxicity, while making drugs that can be used throughout both the developed and developing world, in pediatric populations, and in pregnant women. Highly active antiretroviral therapies (HAARTs) have significantly delayed the progression to AIDS, and in the developed world HIV-1-infected individuals might be expected to live normal life spans while on lifelong therapies. However, the difficult treatment regimens, the presence of class-specific drug toxicities, and the emergence of drug-resistant virus isolates highlight the fact that improvements in our therapeutic regimens and the identification of new and novel viral and cellular targets for therapy are still necessary. Antiretroviral therapeutic strategies and targets continue to be explored, and the development of increasingly potent molecules within existing classes of drugs and the development of novel strategies are ongoing.
Collapse
|
40
|
Louvain de Souza T, de Souza Campos Fernandes RC, Medina-Acosta E. HIV-1 control in battlegrounds: important host genetic variations for HIV-1 mother-to-child transmission and progression to clinical pediatric AIDS. Future Virol 2012. [DOI: 10.2217/fvl.12.49] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
HIV-1 mother-to-child transmission (MTCT) is the passing of maternal HIV-1 to the offspring during pregnancy, labor and delivery, and/or breastfeeding. HIV-1 MTCT and the evolution to pediatric AIDS are multifactorial, dynamic and variable phenotypic conditions. Both genetic and nongenetic variables can influence susceptibility to HIV-1 MTCT or the rate of progression to clinical pediatric AIDS. In this review, we summarize the current state of knowledge about the roles of genetic variations seen in host immune response genes, and those that have been independently associated, mostly through population genetics of candidate genes, with interindividual susceptibility to HIV-1 MTCT, and progression to pediatric AIDS. We examine common and rare host genetic variations at coding and noncoding polymorphisms, whether functional or not, in agonists and antagonists of the immune response, which have been implicated in HIV-1 control in battlegrounds of cell entry, replication and evolution to AIDS. Further, we point to over 380 single-nucleotide polymorphisms, mostly within the HLA super region, recently identified in unbiased genome-wide association studies of HIV replication and evolution in adults, still unexplored in the context of HIV-1 MTCT, and which are likely to also influence susceptibility to pediatric HIV-1/AIDS.
Collapse
Affiliation(s)
- Thais Louvain de Souza
- Molecular Identification & Diagnosis Unit, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Brazil
| | - Regina Célia de Souza Campos Fernandes
- Municipal Program for the Surveillance of Sexually Transmitted Diseases & Acquired Immunodeficiency Syndrome of Campos dos Goytacazes, Brazil
- Faculty of Medicine of Campos, Campos dos Goytacazes, Brazil
| | | |
Collapse
|