1
|
Liu T, Feenstra KA, Huang Z, Heringa J. Mining literature and pathway data to explore the relations of ketamine with neurotransmitters and gut microbiota using a knowledge-graph. Bioinformatics 2024; 40:btad771. [PMID: 38147362 PMCID: PMC10769815 DOI: 10.1093/bioinformatics/btad771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 11/06/2023] [Accepted: 12/25/2023] [Indexed: 12/27/2023] Open
Abstract
MOTIVATION Up-to-date pathway knowledge is usually presented in scientific publications for human reading, making it difficult to utilize these resources for semantic integration and computational analysis of biological pathways. We here present an approach to mining knowledge graphs by combining manual curation with automated named entity recognition and automated relation extraction. This approach allows us to study pathway-related questions in detail, which we here show using the ketamine pathway, aiming to help improve understanding of the role of gut microbiota in the antidepressant effects of ketamine. RESULTS The thus devised ketamine pathway 'KetPath' knowledge graph comprises five parts: (i) manually curated pathway facts from images; (ii) recognized named entities in biomedical texts; (iii) identified relations between named entities; (iv) our previously constructed microbiota and pre-/probiotics knowledge bases; and (v) multiple community-accepted public databases. We first assessed the performance of automated extraction of relations between named entities using the specially designed state-of-the-art tool BioKetBERT. The query results show that we can retrieve drug actions, pathway relations, co-occurring entities, and their relations. These results uncover several biological findings, such as various gut microbes leading to increased expression of BDNF, which may contribute to the sustained antidepressant effects of ketamine. We envision that the methods and findings from this research will aid researchers who wish to integrate and query data and knowledge from multiple biomedical databases and literature simultaneously. AVAILABILITY AND IMPLEMENTATION Data and query protocols are available in the KetPath repository at https://dx.doi.org/10.5281/zenodo.8398941 and https://github.com/tingcosmos/KetPath.
Collapse
Affiliation(s)
- Ting Liu
- Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081 HV, The Netherlands
- Learning & Reasoning Group, Vrije Universiteit Amsterdam, Amsterdam 1081 HV, The Netherlands
| | - K Anton Feenstra
- Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081 HV, The Netherlands
| | - Zhisheng Huang
- Learning & Reasoning Group, Vrije Universiteit Amsterdam, Amsterdam 1081 HV, The Netherlands
| | - Jaap Heringa
- Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081 HV, The Netherlands
| |
Collapse
|
2
|
Gavai A, Bouzembrak Y, Mu W, Martin F, Kaliyaperumal R, van Soest J, Choudhury A, Heringa J, Dekker A, Marvin HJP. Author Correction: Applying federated learning to combat food fraud in food supply chains. NPJ Sci Food 2023; 7:57. [PMID: 37857631 PMCID: PMC10587136 DOI: 10.1038/s41538-023-00232-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023] Open
Affiliation(s)
- Anand Gavai
- Industrial Engineering & Business Information Systems, University of Twente, Enschede, The Netherlands
- Wageningen Food Safety Research, Akkermaalsbos 2, 6708 WB, Wageningen, The Netherlands
| | - Yamine Bouzembrak
- Wageningen Food Safety Research, Akkermaalsbos 2, 6708 WB, Wageningen, The Netherlands.
- Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands.
| | - Wenjuan Mu
- Wageningen Food Safety Research, Akkermaalsbos 2, 6708 WB, Wageningen, The Netherlands
| | - Frank Martin
- Netherlands Comprehensive Cancer Organization (IKNL), Eindhoven, The Netherlands
| | - Rajaram Kaliyaperumal
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Johan van Soest
- Brightlands Institute for Smart Society, Faculty of Science and Engineering, Maastricht University, Heerlen, The Netherlands
- Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre, Maastricht, The Netherlands
| | - Ananya Choudhury
- Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre, Maastricht, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Andre Dekker
- Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre, Maastricht, The Netherlands
| | - Hans J P Marvin
- Wageningen Food Safety Research, Akkermaalsbos 2, 6708 WB, Wageningen, The Netherlands
- Department of Research, Hayan Group, Rhenen, The Netherlands
| |
Collapse
|
3
|
Gavai A, Bouzembrak Y, Mu W, Martin F, Kaliyaperumal R, van Soest J, Choudhury A, Heringa J, Dekker A, Marvin HJP. Applying federated learning to combat food fraud in food supply chains. NPJ Sci Food 2023; 7:46. [PMID: 37658060 PMCID: PMC10474077 DOI: 10.1038/s41538-023-00220-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 08/16/2023] [Indexed: 09/03/2023] Open
Abstract
Ensuring safe and healthy food is a big challenge due to the complexity of food supply chains and their vulnerability to many internal and external factors, including food fraud. Recent research has shown that Artificial Intelligence (AI) based algorithms, in particularly data driven Bayesian Network (BN) models, are very suitable as a tool to predict future food fraud and hence allowing food producers to take proper actions to avoid that such problems occur. Such models become even more powerful when data can be used from all actors in the supply chain, but data sharing is hampered by different interests, data security and data privacy. Federated learning (FL) may circumvent these issues as demonstrated in various areas of the life sciences. In this research, we demonstrate the potential of the FL technology for food fraud using a data driven BN, integrating data from different data owners without the data leaving the database of the data owners. To this end, a framework was constructed consisting of three geographically different data stations hosting different datasets on food fraud. Using this framework, a BN algorithm was implemented that was trained on the data of different data stations while the data remained at its physical location abiding by privacy principles. We demonstrated the applicability of the federated BN in food fraud and anticipate that such framework may support stakeholders in the food supply chain for better decision-making regarding food fraud control while still preserving the privacy and confidentiality nature of these data.
Collapse
Affiliation(s)
- Anand Gavai
- Industrial Engineering & Business Information Systems, University of Twente, Enschede, The Netherlands
- Wageningen Food Safety Research, Akkermaalsbos 2, 6708 WB, Wageningen, The Netherlands
| | - Yamine Bouzembrak
- Wageningen Food Safety Research, Akkermaalsbos 2, 6708 WB, Wageningen, The Netherlands.
- Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands.
| | - Wenjuan Mu
- Wageningen Food Safety Research, Akkermaalsbos 2, 6708 WB, Wageningen, The Netherlands
| | - Frank Martin
- Netherlands Comprehensive Cancer Organization (IKNL), Eindhoven, The Netherlands
| | - Rajaram Kaliyaperumal
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Johan van Soest
- Brightlands Institute for Smart Society, Faculty of Science and Engineering, Maastricht University, Heerlen, The Netherlands
- Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre, Maastricht, The Netherlands
| | - Ananya Choudhury
- Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre, Maastricht, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Andre Dekker
- Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre, Maastricht, The Netherlands
| | - Hans J P Marvin
- Wageningen Food Safety Research, Akkermaalsbos 2, 6708 WB, Wageningen, The Netherlands
- Department of Research, Hayan Group, Rhenen, The Netherlands
| |
Collapse
|
4
|
Lakbir S, Lahoz S, Cuatrecasas M, Camps J, Glas RA, Heringa J, Meijer GA, Abeln S, Fijneman RJA. Tumour break load is a biologically relevant feature of genomic instability with prognostic value in colorectal cancer. Eur J Cancer 2022; 177:94-102. [PMID: 36334560 DOI: 10.1016/j.ejca.2022.09.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 09/28/2022] [Accepted: 09/30/2022] [Indexed: 01/06/2023]
Abstract
BACKGROUND Clinically implemented prognostic biomarkers are lacking for the 80% of colorectal cancers (CRCs) that exhibit chromosomal instability (CIN). CIN is characterised by chromosome segregation errors and double-strand break repair defects that lead to somatic copy number aberrations (SCNAs) and chromosomal rearrangement-associated structural variants (SVs), respectively. We hypothesise that the number of SVs is a distinct feature of genomic instability and defined a new measure to quantify SVs: the tumour break load (TBL). The present study aimed to characterise the biological impact and clinical relevance of TBL in CRC. METHODS Disease-free survival and SCNA data were obtained from The Cancer Genome Atlas and two independent CRC studies. TBL was defined as the sum of SCNA-associated SVs. RNA gene expression data of microsatellite stable (MSS) CRC samples were used to train an RNA-based TBL classifier. Dichotomised DNA-based TBL data were used for survival analysis. RESULTS TBL shows large variation in CRC with poor correlation to tumour mutational burden and fraction of genome altered. TBL impact on tumour biology was illustrated by the high accuracy of classifying cancers in TBL-high and TBL-low (area under the receiver operating characteristic curve [AUC]: 0.88; p < 0.01). High TBL was associated with disease recurrence in 85 stages II-III MSS CRCs from The Cancer Genome Atlas (hazard ratio [HR]: 6.1; p = 0.007) and in two independent validation series of 57 untreated stages II-III (HR: 4.1; p = 0.012) and 74 untreated stage II MSS CRCs (HR: 2.4; p = 0.01). CONCLUSION TBL is a prognostic biomarker in patients with non-metastatic MSS CRC with great potential to be implemented in routine molecular diagnostics.
Collapse
Affiliation(s)
- Soufyan Lakbir
- Bioinformatics Group, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam 1081HV, the Netherlands; Department of Pathology, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066CX, the Netherlands
| | - Sara Lahoz
- Translational Colorectal Cancer Genomics, Gastrointestinal and Pancreatic Oncology Team, Institut D'Investigacions Biomèdiques August Pi I Sunyer (IDIBAPS), Hospital Clínic de Barcelona, Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Barcelona, 08036, Spain
| | - Miriam Cuatrecasas
- Pathology Department, Biomedical Diagnostic Center (CDB), Hospital Clínic de Barcelona, Institut D'Investigacions Biomèdiques August Pi I Sunyer (IDIBAPS), Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Universitat de Barcelona (UB), Barcelona, 08036, Spain
| | - Jordi Camps
- Translational Colorectal Cancer Genomics, Gastrointestinal and Pancreatic Oncology Team, Institut D'Investigacions Biomèdiques August Pi I Sunyer (IDIBAPS), Hospital Clínic de Barcelona, Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Barcelona, 08036, Spain; Department of Cell Biology, Physiology and Immunology, Faculty of Medicine, Autonomous University of Barcelona, Bellaterra, 08193, Spain
| | - Roel A Glas
- Bioinformatics Group, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam 1081HV, the Netherlands; Department of Pathology, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066CX, the Netherlands
| | - Jaap Heringa
- Bioinformatics Group, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam 1081HV, the Netherlands; AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam, Amsterdam 1081HV, the Netherlands
| | - Gerrit A Meijer
- Department of Pathology, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066CX, the Netherlands
| | - Sanne Abeln
- Bioinformatics Group, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam 1081HV, the Netherlands; Life Sciences and Health Research Group, Centrum Wiskunde & Informatica (CWI), Science Park 123, Amsterdam 1098 XG, the Netherlands.
| | - Remond J A Fijneman
- Department of Pathology, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066CX, the Netherlands.
| |
Collapse
|
5
|
van Bree E, Alarcón CR, Lakbir S, Stelloo E, Buranelli C, Hondema A, van 't Erve I, Vessies D, Delis-van Diemen P, Tijssen M, Bolijn A, Lanfermeijer M, Linders D, Swennenhuis J, van den Broek D, Heringa J, Meijer G, Carvalho B, Feitsma H, Abeln S, Fijneman RJA. Abstract A020: Structural variants in the pathogenesis of colorectal cancer: The elephant in the room. Cancer Res 2022. [DOI: 10.1158/1538-7445.crc22-a020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Abstract
Background: Cancer is caused by somatic DNA alterations, comprising single/small nucleotide variants (SNVs), somatic copy number alterations (SCNAs) and chromosomal rearrangement structural variants (SVs). We previously demonstrated that SVs are recurrently identified in hundreds of genes and are highly prevalent in common fragile site genes, e.g., in MACROD2 in >40% of colorectal cancers (CRCs). However, computational methods that discriminate SV-driver from SV-passenger events are lacking and laboratory methods to detect SVs at nucleotide resolution from routinely obtained formalin-fixed paraffin-embedded (FFPE) tumor tissue material are underdeveloped. Therefore, despite the abundant presence of SVs, knowledge about their biological and clinical impact is limited. Aim: The aim of our studies is to identify genes of which the function is frequently affected by SV, to understand how these genes contribute to CRC pathogenesis, and to translate these SVs into clinically relevant biomarkers. Methods: We made use of publicly available deep whole genome DNA sequencing data and tumor-matched RNA sequencing data from the Hartwig Medical Foundation to develop the algorithm ‘CoBRA’: Computation of Biologically Relevant Alterations. Adenoma-derived organoids were used for CRISPR/Cas9-mediated gene modulation for functional analysis of SV-driver events. Cergentis’ targeted locus capture (FFPE-TLC) technology was used to detect SVs at nucleotide resolution from FFPE material, which were translated into droplet digital PCR (ddPCR) assays for the detection of SVs in cell-free circulating tumor DNA (ctDNA) in liquid biopsies. Results: The CoBRA algorithm associated the presence of SV-events in frequently affected genes to the extent in which genome-wide RNA sequencing data were altered. In this way, CoBRA ranked SV-events in genes according to their putative impact on tumor biology. SVs in MACROD2 ranked among those with the highest impact on tumor biology. Therefore, we generated focal deletions in MACROD2 in adenoma-derived organoids for functional analyses. Moreover, using FFPE tumor tissue material we detected SVs at nucleotide resolution in MACROD2 and three other genes in 21 out of 29 patients. SVs were verified by PCR on tumor tissue and subsequently translated into ddPCR biomarker assays for detection of SVs in ctDNA in blood from the same patients. Conclusions: We developed the computational method CoBRA and succeeded to detect SVs with high impact on tumor biology. These SVs are prioritized for functional analysis in pre-malignant adenoma-derived organoids; for targeted detection in routinely obtained FFPE tumor tissue material; and for translation into liquid biopsy ctDNA assays. Proof of concept was delivered for MACROD2. Our novel computational and laboratory methodologies provide valuable tools to effectively explore the biological and clinical impact of SVs, which will contribute to our understanding of these common recurrent somatic alterations in CRC and their translation into clinically relevant biomarker applications.
Citation Format: Elise van Bree, Carmen Rubio Alarcón, Soufyan Lakbir, Ellen Stelloo, Caterina Buranelli, Amber Hondema, Iris van 't Erve, Daan Vessies, Pien Delis-van Diemen, Marianne Tijssen, Anne Bolijn, Mirthe Lanfermeijer, Dorothe Linders, Joost Swennenhuis, Daan van den Broek, Jaap Heringa, Gerrit Meijer, Beatriz Carvalho, Harma Feitsma, Sanne Abeln, Remond J. A. Fijneman. Structural variants in the pathogenesis of colorectal cancer: The elephant in the room [abstract]. In: Proceedings of the AACR Special Conference on Colorectal Cancer; 2022 Oct 1-4; Portland, OR. Philadelphia (PA): AACR; Cancer Res 2022;82(23 Suppl_1):Abstract nr A020.
Collapse
Affiliation(s)
| | | | - Soufyan Lakbir
- 2Netherlands Cancer Institute & VU University, Amsterdam, Netherlands,
| | | | | | - Amber Hondema
- 1Netherlands Cancer Institute, Amsterdam, Netherlands,
| | | | - Daan Vessies
- 1Netherlands Cancer Institute, Amsterdam, Netherlands,
| | | | | | - Anne Bolijn
- 1Netherlands Cancer Institute, Amsterdam, Netherlands,
| | | | | | | | | | | | - Gerrit Meijer
- 1Netherlands Cancer Institute, Amsterdam, Netherlands,
| | | | | | | | | |
Collapse
|
6
|
Liu T, Lan G, Feenstra KA, Huang Z, Heringa J. Towards a knowledge graph for pre-/probiotics and microbiota-gut-brain axis diseases. Sci Rep 2022; 12:18977. [PMID: 36347868 PMCID: PMC9643397 DOI: 10.1038/s41598-022-21735-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 09/30/2022] [Indexed: 11/09/2022] Open
Abstract
Scientific publications present biological relationships but are structured for human reading, making it difficult to use this resource for semantic integration and querying. Existing databases, on the other hand, are well structured for automated analysis, but do not contain comprehensive biological knowledge. We devised an approach for constructing comprehensive knowledge graphs from these two types of resources and applied it to investigate relationships between pre-/probiotics and microbiota-gut-brain axis diseases. To this end, we created (i) a knowledge base, dubbed ppstatement, containing manually curated detailed annotations, and (ii) a knowledge base, called ppconcept, containing automatically annotated concepts. The resulting Pre-/Probiotics Knowledge Graph (PPKG) combines these two knowledge bases with three other public databases (i.e. MeSH, UMLS and SNOMED CT). To validate the performance of PPKG and to demonstrate the added value of integrating two knowledge bases, we created four biological query cases. The query cases demonstrate that we can retrieve co-occurring concepts of interest, and also that combining the two knowledge bases leads to more comprehensive query results than utilizing them separately. The PPKG enables users to pose research queries such as "which pre-/probiotics combinations may benefit depression?", potentially leading to novel biological insights.
Collapse
Affiliation(s)
- Ting Liu
- grid.12380.380000 0004 1754 9227Department of Computer Science, Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands ,grid.12380.380000 0004 1754 9227Knowledge Representation and Reasoning Group, Department of Computer Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Gongjin Lan
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - K. Anton Feenstra
- grid.12380.380000 0004 1754 9227Department of Computer Science, Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Zhisheng Huang
- grid.12380.380000 0004 1754 9227Knowledge Representation and Reasoning Group, Department of Computer Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Jaap Heringa
- grid.12380.380000 0004 1754 9227Department of Computer Science, Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
| |
Collapse
|
7
|
Stringer B, de Ferrante H, Abeln S, Heringa J, Feenstra KA, Haydarlou R. PIPENN: protein interface prediction from sequence with an ensemble of neural nets. Bioinformatics 2022; 38:2111-2118. [PMID: 35150231 PMCID: PMC9004643 DOI: 10.1093/bioinformatics/btac071] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/16/2022] [Accepted: 02/04/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. AVAILABILITY AND IMPLEMENTATION Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Hans de Ferrante
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - Sanne Abeln
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - K Anton Feenstra
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | | |
Collapse
|
8
|
Hou Q, Stringer B, Waury K, Capel H, Haydarlou R, Xue F, Abeln S, Heringa J, Feenstra KA. SeRenDIP-CE: Sequence-based Interface Prediction for Conformational Epitopes. Bioinformatics 2021; 37:3421-3427. [PMID: 33974039 PMCID: PMC8136078 DOI: 10.1093/bioinformatics/btab321] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/26/2021] [Accepted: 04/26/2021] [Indexed: 11/21/2022] Open
Abstract
Motivation Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen’s epitope region, as a special type of protein–protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody. Results We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody–antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research. Availability and implementation Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250002, P. R. China.,National institute of health data science of China, Shandong University, Shandong 250002, P. R. China
| | - Bas Stringer
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Katharina Waury
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Henriette Capel
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Reza Haydarlou
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250002, P. R. China.,National institute of health data science of China, Shandong University, Shandong 250002, P. R. China
| | - Sanne Abeln
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Jaap Heringa
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands.,AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam
| | - K Anton Feenstra
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands.,AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam
| |
Collapse
|
9
|
Liu T, Pan X, Wang X, Feenstra KA, Heringa J, Huang Z. Predicting the relationships between gut microbiota and mental disorders with knowledge graphs. Health Inf Sci Syst 2020; 9:3. [PMID: 33262885 PMCID: PMC7686388 DOI: 10.1007/s13755-020-00128-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 09/30/2020] [Indexed: 01/14/2023] Open
Abstract
Gut microbiota produce and modulate the production of neurotransmitters which have been implicated in mental disorders. Neurotransmitters may act as ‘matchmaker’ between gut microbiota imbalance and mental disorders. Most of the relevant research effort goes into the relationship between gut microbiota and neurotransmitters and the other between neurotransmitters and mental disorders, while few studies collect and analyze the dispersed research results in systematic ways. We therefore gather the dispersed results that in the existing studies into a structured knowledge base for identifying and predicting the potential relationships between gut microbiota and mental disorders. In this study, we propose to construct a gut microbiota knowledge graph for mental disorder, which named as MiKG4MD. It is extendable by linking to future ontologies by just adding new relationships between existing information and new entities. This extendibility is emphasized for the integration with existing popular ontologies/terminologies, e.g. UMLS, MeSH, and KEGG. We demonstrate the performance of MiKG4MD with three SPARQL query test cases. Results show that the MiKG4MD knowledge graph is an effective method to predict the relationships between gut microbiota and mental disorders.
Collapse
Affiliation(s)
- Ting Liu
- Knowledge Representation and Reasoning (KR&R) Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.,Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Xueli Pan
- Knowledge Representation and Reasoning (KR&R) Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Xu Wang
- Knowledge Representation and Reasoning (KR&R) Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Zhisheng Huang
- Knowledge Representation and Reasoning (KR&R) Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.,Brain Protection Innovation Center, Capital Medical University, Beijing, China
| |
Collapse
|
10
|
Dijkstra MJJ, van der Ploeg AJ, Feenstra KA, Fokkink WJ, Abeln S, Heringa J. Tailor-made multiple sequence alignments using the PRALINE 2 alignment toolkit. Bioinformatics 2020; 35:5315-5317. [PMID: 31368486 PMCID: PMC6954659 DOI: 10.1093/bioinformatics/btz572] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 05/29/2019] [Accepted: 07/29/2019] [Indexed: 12/03/2022] Open
Abstract
Summary PRALINE 2 is a toolkit for custom multiple sequence alignment workflows. It can be used to incorporate sequence annotations, such as secondary structure or (DNA) motifs, into the alignment scoring, as well as to customize many other aspects of a progressive multiple alignment workflow. Availability and implementation PRALINE 2 is implemented in Python and available as open source software on GitHub: https://github.com/ibivu/PRALINE/.
Collapse
Affiliation(s)
- Maurits J J Dijkstra
- Department of Computer Science, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands
| | - Atze J van der Ploeg
- Department of Computer Science, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands
| | - K Anton Feenstra
- Department of Computer Science, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands
| | - Wan J Fokkink
- Department of Computer Science, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands
| | - Sanne Abeln
- Department of Computer Science, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands
| |
Collapse
|
11
|
Hou Q, De Geest PFG, Griffioen CJ, Abeln S, Heringa J, Feenstra KA. SeRenDIP: SEquential REmasteriNg to DerIve profiles for fast and accurate predictions of PPI interface positions. Bioinformatics 2020; 35:4794-4796. [PMID: 31116381 DOI: 10.1093/bioinformatics/btz428] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/12/2019] [Accepted: 05/17/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Interpretation of ubiquitous protein sequence data has become a bottleneck in biomolecular research, due to a lack of structural and other experimental annotation data for these proteins. Prediction of protein interaction sites from sequence may be a viable substitute. We therefore recently developed a sequence-based random forest method for protein-protein interface prediction, which yielded a significantly increased performance than other methods on both homomeric and heteromeric protein-protein interactions. Here, we present a webserver that implements this method efficiently. RESULTS With the aim of accelerating our previous approach, we obtained sequence conservation profiles by re-mastering the alignment of homologous sequences found by PSI-BLAST. This yielded a more than 10-fold speedup and at least the same accuracy, as reported previously for our method; these results allowed us to offer the method as a webserver. The web-server interface is targeted to the non-expert user. The input is simply a sequence of the protein of interest, and the output a table with scores indicating the likelihood of having an interaction interface at a certain position. As the method is sequence-based and not sensitive to the type of protein interaction, we expect this webserver to be of interest to many biological researchers in academia and in industry. AVAILABILITY AND IMPLEMENTATION Webserver, source code and datasets are available at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of BioModeling, BioInformatics & BioProcesses, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Paul F G De Geest
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Christian J Griffioen
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Sanne Abeln
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Jaap Heringa
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands.,AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - K Anton Feenstra
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands.,AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| |
Collapse
|
12
|
Jacobsen A, Ivanova O, Amini S, Heringa J, Kemmeren P, Feenstra KA. A framework for exhaustive modelling of genetic interaction patterns using Petri nets. Bioinformatics 2020; 36:2142-2149. [PMID: 31845959 DOI: 10.1093/bioinformatics/btz917] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 07/09/2019] [Accepted: 12/13/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Genetic interaction (GI) patterns are characterized by the phenotypes of interacting single and double mutated gene pairs. Uncovering the regulatory mechanisms of GIs would provide a better understanding of their role in biological processes, diseases and drug response. Computational analyses can provide insights into the underpinning mechanisms of GIs. RESULTS In this study, we present a framework for exhaustive modelling of GI patterns using Petri nets (PN). Four-node models were defined and generated on three levels with restrictions, to enable an exhaustive approach. Simulations suggest ∼5 million models of GIs. Generalizing these we propose putative mechanisms for the GI patterns, inversion and suppression. We demonstrate that exhaustive PN modelling enables reasoning about mechanisms of GIs when only the phenotypes of gene pairs are known. The framework can be applied to other GI or genetic regulatory datasets. AVAILABILITY AND IMPLEMENTATION The framework is available at http://www.ibi.vu.nl/programs/ExhMod. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Annika Jacobsen
- Department of Computer Science, Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Amsterdam, Netherlands
| | - Olga Ivanova
- Department of Computer Science, Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Amsterdam, Netherlands
| | - Saman Amini
- Princess Máxima Center for Pediatric Oncology, 3584 CS Utrecht, Netherlands.,Divison of Biomedical Genetics, Center for Molecular Medicine, University Medical Centre Utrecht, 3584 CX Utrecht, Netherlands
| | - Jaap Heringa
- Department of Computer Science, Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Amsterdam, Netherlands
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, 3584 CS Utrecht, Netherlands.,Divison of Biomedical Genetics, Center for Molecular Medicine, University Medical Centre Utrecht, 3584 CX Utrecht, Netherlands
| | - K Anton Feenstra
- Department of Computer Science, Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Amsterdam, Netherlands
| |
Collapse
|
13
|
Jacobsen A, de Miranda Azevedo R, Juty N, Batista D, Coles S, Cornet R, Courtot M, Crosas M, Dumontier M, Evelo CT, Goble C, Guizzardi G, Hansen KK, Hasnain A, Hettne K, Heringa J, Hooft RW, Imming M, Jeffery KG, Kaliyaperumal R, Kersloot MG, Kirkpatrick CR, Kuhn T, Labastida I, Magagna B, McQuilton P, Meyers N, Montesanti A, van Reisen M, Rocca-Serra P, Pergl R, Sansone SA, da Silva Santos LOB, Schneider J, Strawn G, Thompson M, Waagmeester A, Weigel T, Wilkinson MD, Willighagen EL, Wittenburg P, Roos M, Mons B, Schultes E. FAIR Principles: Interpretations and Implementation Considerations. Data Intellegence 2020. [DOI: 10.1162/dint_r_00024] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The FAIR principles have been widely cited, endorsed and adopted by a broad range of stakeholders since their publication in 2016. By intention, the 15 FAIR guiding principles do not dictate specific technological implementations, but provide guidance for improving Findability, Accessibility, Interoperability and Reusability of digital resources. This has likely contributed to the broad adoption of the FAIR principles, because individual stakeholder communities can implement their own FAIR solutions. However, it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations. Thus, while the FAIR principles are formulated on a high level and may be interpreted and implemented in different ways, for true interoperability we need to support convergence in implementation choices that are widely accessible and (re)-usable. We introduce the concept of FAIR implementation considerations to assist accelerated global participation and convergence towards accessible, robust, widespread and consistent FAIR implementations. Any self-identified stakeholder community may either choose to reuse solutions from existing implementations, or when they spot a gap, accept the challenge to create the needed solution, which, ideally, can be used again by other communities in the future. Here, we provide interpretations and implementation considerations (choices and challenges) for each FAIR principle.
Collapse
Affiliation(s)
- Annika Jacobsen
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
| | - Ricardo de Miranda Azevedo
- Institute of Data Science, Maastricht University, Universiteitssingel 60, Maastricht 6229 ER, The Netherlands
| | - Nick Juty
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Dominique Batista
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | - Simon Coles
- School of Chemistry, Faculty of Engineering and Physical Sciences, University of Southampton, SO17 1BJ, UK
| | - Ronald Cornet
- Amsterdam UMC, University of Amsterdam, Amsterdam 1000 GG, The Netherlands
| | - Mélanie Courtot
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| | - Mercè Crosas
- Harvard University, Cambridge, Massachusetts 02138, USA
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, Universiteitssingel 60, Maastricht 6229 ER, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics – BiGCaT, NUTRIM, Maastricht University, Maastricht 6229 ER, The Netherlands
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Giancarlo Guizzardi
- Conceptual and Cognitive Modeling Research Group (CORE), Free University of Bozen-Bolzano, Bolzano 39100, Italy
| | | | - Ali Hasnain
- Insight Centre for Data Analytics, National University of Ireland Galway, H91 TK33, Ireland
| | - Kristina Hettne
- Centre for Digital Scholarship, Leiden University Libraries, Leiden, 2333 ZA, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 11051081 HV Amsterdam, The Netherlands
| | - Rob W.W. Hooft
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 11051081 HV Amsterdam, The Netherlands
- Dutch Techcentre for Life Sciences (DTL), Utrecht, The Netherlands
| | | | | | | | - Martijn G. Kersloot
- Amsterdam UMC, University of Amsterdam, Amsterdam 1000 GG, The Netherlands
- Castor EDC, Paasheuvelweg 25, Wing 5D, 1105 BP, Amsterdam, The Netherlands
| | - Christine R. Kirkpatrick
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, USA
| | - Tobias Kuhn
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 11051081 HV Amsterdam, The Netherlands
| | - Ignasi Labastida
- Learning and Research Resources Centre (CRAI), Universitat de Barcelona, 08007 Barcelona, Spain
| | | | - Peter McQuilton
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | | | | | - Mirjam van Reisen
- Liacs Institute of Advanced Computer Science, Leiden University, 2311 GJ Leiden, The Netherlands
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | - Robert Pergl
- Czech Technical University in Prague, Faculty of Information Technology (FIT CTU), 160 00 Prague 6, Czech Republic
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | | | - Juliane Schneider
- Harvard Catalyst
- Clinical and Translational Science Center, Boston, MA 02115, USA
| | - George Strawn
- US National Academy of Sciences, Washington DC 20418, USA
| | - Mark Thompson
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
| | | | - Tobias Weigel
- Deutsches Klimarechenzentrum, Bundesstrasse 45a, 20146 Hamburg, Germany
| | - Mark D. Wilkinson
- Center for Plant Biotechnology and Genomics UPM-INIA, Madrid 28040, Spain
| | - Egon L. Willighagen
- Department of Bioinformatics – BiGCaT, NUTRIM, Maastricht University, Maastricht 6229 ER, The Netherlands
| | - Peter Wittenburg
- Max Planck Computing and Data Facility, Gießenbachstraße 2, 85748 Garching, Germany
| | - Marco Roos
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
| | - Barend Mons
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
- GO FAIR International Support & Coordination Office (GFISCO), Leiden, The Netherlands
| | - Erik Schultes
- GO FAIR International Support & Coordination Office (GFISCO), Leiden, The Netherlands
- Leiden Center for Data Science, 2311 EZ Leiden, The Netherlands
| |
Collapse
|
14
|
Saunders G, Baudis M, Becker R, Beltran S, Béroud C, Birney E, Brooksbank C, Brunak S, Van den Bulcke M, Drysdale R, Capella-Gutierrez S, Flicek P, Florindi F, Goodhand P, Gut I, Heringa J, Holub P, Hooyberghs J, Juty N, Keane TM, Korbel JO, Lappalainen I, Leskosek B, Matthijs G, Mayrhofer MT, Metspalu A, Navarro A, Newhouse S, Nyrönen T, Page A, Persson B, Palotie A, Parkinson H, Rambla J, Salgado D, Steinfelder E, Swertz MA, Valencia A, Varma S, Blomberg N, Scollen S. Leveraging European infrastructures to access 1 million human genomes by 2022. Nat Rev Genet 2019; 20:693-701. [PMID: 31455890 PMCID: PMC7115898 DOI: 10.1038/s41576-019-0156-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2019] [Indexed: 01/22/2023]
Abstract
Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
Collapse
Affiliation(s)
- Gary Saunders
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Regina Becker
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg, Luxembourg
| | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Christophe Béroud
- Aix Marseille Univ, INSERM, MMG, Marseille, France
- Département de Génétique Médicale et de Biologie Cellulaire, APHM, Hôpital d'Enfants de la Timone, Marseille, France
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Cath Brooksbank
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Søren Brunak
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Peter Goodhand
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Global Alliance for Genomics and Health, Toronto, Ontario, Canada
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jaap Heringa
- Department of Computer Science, Vrije Universiteit, Amsterdam, Netherlands
| | | | - Jef Hooyberghs
- Flemish Institute for Technological Research, VITO, Mol, Belgium
| | - Nick Juty
- School of Computer Science, The University of Manchester, Manchester, UK
| | - Thomas M Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | | | - Brane Leskosek
- IBMI, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | | | | | | | - Arcadi Navarro
- Institute of Evolutionary Biology (UPF-CSIC), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Steven Newhouse
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Angela Page
- Global Alliance for Genomics and Health, Toronto, Ontario, Canada
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Bengt Persson
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala, Sweden
| | - Aarno Palotie
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Jordi Rambla
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | | | | | - Morris A Swertz
- BBMRI-NL/University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Alfonso Valencia
- Barcelona Supercomputing Centre (BSC), Barcelona, Spain
- ICREA, Pg., Barcelona, Spain
| | - Susheel Varma
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Serena Scollen
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK.
| |
Collapse
|
15
|
Willems SM, Abeln S, Feenstra KA, de Bree R, van der Poel EF, Baatenburg de Jong RJ, Heringa J, van den Brekel MWM. The potential use of big data in oncology. Oral Oncol 2019; 98:8-12. [PMID: 31521885 DOI: 10.1016/j.oraloncology.2019.09.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 07/31/2019] [Accepted: 09/06/2019] [Indexed: 12/16/2022]
Abstract
In this era of information technology, big data analysis is entering biomedical sciences. But what is big data, where do they come from and what can we do with it? In this commentary, the main sources of big data are explained, especially in (head and neck) oncology. It also touches upon the need to integrate various sources of clinical, pathological and quality-of-life data. It discusses some initiatives in linking of such datasets on a nation-wide scale in the Netherlands. Finally, it touches upon important issues regarding governance, FAIRness of data and the need to bring into place the necessary infrastructures needed to fully exploit the full potential of big data sets in head and neck cancer.
Collapse
Affiliation(s)
- Stefan M Willems
- Department of Pathology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands; Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands.
| | - Sanne Abeln
- Department of Computer Science, Faculty of Science, Vrije Universiteit, Amsterdam, the Netherlands
| | - K Anton Feenstra
- Department of Computer Science, Faculty of Science, Vrije Universiteit, Amsterdam, the Netherlands
| | - Remco de Bree
- Department of Head and Neck Surgical Oncology, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Egge F van der Poel
- Department of Head and Neck Surgery, Erasmus Cancer Center, Erasmus MC, Rotterdam, the Netherlands
| | | | - Jaap Heringa
- Department of Computer Science, Faculty of Science, Vrije Universiteit, Amsterdam, the Netherlands
| | - Michiel W M van den Brekel
- Department of Head and Neck Oncology and Surgery, Netherlands Cancer Institute, Amsterdam, the Netherlands
| |
Collapse
|
16
|
van Gelder CWG, Hooft RWW, van Rijswijk MN, van den Berg L, Kok RG, Reinders M, Mons B, Heringa J. Bioinformatics in the Netherlands: the value of a nationwide community. Brief Bioinform 2019; 20:540-550. [PMID: 28968694 PMCID: PMC6433734 DOI: 10.1093/bib/bbx087] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Revised: 07/03/2017] [Indexed: 11/14/2022] Open
Abstract
This review provides a historical overview of the inception and development of bioinformatics research in the Netherlands. Rooted in theoretical biology by foundational figures such as Paulien Hogeweg (at Utrecht University since the 1970s), the developments leading to organizational structures supporting a relatively large Dutch bioinformatics community will be reviewed. We will show that the most valuable resource that we have built over these years is the close-knit national expert community that is well engaged in basic and translational life science research programmes. The Dutch bioinformatics community is accustomed to facing the ever-changing landscape of data challenges and working towards solutions together. In addition, this community is the stable factor on the road towards sustainability, especially in times where existing funding models are challenged and change rapidly.
Collapse
|
17
|
Fijneman RJA, Mekkes N, Broek EVD, Stringer B, Glas RA, Komor MA, Rausch C, Lieshout SV, Cuppen E, Smith ML, Sebra RP, Rowell WJ, Ashby M, Carvalho B, Heringa J, Meijer GA, Abeln S. Abstract 1738: Characterization of structural variants within MACROD2 in the pathogenesis of colorectal cancer. Cancer Res 2019. [DOI: 10.1158/1538-7445.am2019-1738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: Cancer is caused by somatic DNA alterations, which comprise small nucleotide variants (SNVs), chromosome somatic copy number alterations (SCNAs) and chromosomal breakpoint structural variants (SVs). Previously, we investigated SCNA-associated SVs in colorectal cancer (CRC) and demonstrated that SVs within the MACROD2 gene are highly prevalent. This raises the question whether SVs in MACROD2 may already be present in CRC precursor lesions, i.e. in colorectal adenomas. We have also demonstrated that loss of MACROD2 protein expression is associated with poor response to treatment with 5-fluorouracil-based adjuvant chemotherapy, indicating that MACROD2 function is clinically relevant. The aim of this study is to characterize SVs within MACROD2 in more detail in the pathogenesis of colorectal cancer.
Methods: The frequencies of SCNA-associated SVs in 466 CRCs were compared to those in 118 colorectal adenomas, using array-comparative genomic hybridization. Targeted PacBio long-read sequencing was applied to detect and characterize SVs at nucleotide resolution within MACROD2, in tens of primary CRCs. Illumina whole genome sequencing data of > 450 CRC metastatic lesions, generated by the Hartwig Medical Foundation (HMF; www.hartwigmedicalfoundation.nl), were used for validation purposes.
Results: MACROD2 SCNA-associated SVs were rarely detected among 118 colorectal adenomas (<2%) while being highly prevalent among 466 CRCs (40%). SVs in MACROD2 are currently being characterized at nucleotide resolution by analysis of targeted PacBio long-read sequencing data, the results of which will be presented during the AACR annual meeting. Preliminary analysis of HMF whole genome sequencing data confirms that at least 40% of CRC metastatic lesions are affected by SVs within the MACROD2 gene, most commonly by focal deletions.
Discussion: The current observation that SVs in MACROD2 are nearly absent in adenomas while being highly prevalent in CRCs indicates that MACROD2 is affected at a late stage of colorectal adenoma-to-carcinoma progression. A recent publication by Sakthianandeswaren et al (Cancer Discovery 2018) indicated that loss of MACROD2 promotes chromosomal instability. Taken together, these data support a model in which adenoma-to-carcinoma progression is driven, at least in part, by genomic instability caused by loss of function of the MACROD2 tumor suppressor gene.
Citation Format: Remond J A Fijneman, Nienke Mekkes, Evert van den Broek, Bas Stringer, Roel A. Glas, Malgorzata A. Komor, Christian Rausch, Stef van Lieshout, Edwin Cuppen, Melissa L. Smith, Robert P. Sebra, William J. Rowell, Meredith Ashby, Beatriz Carvalho, Jaap Heringa, Gerrit A. Meijer, Sanne Abeln. Characterization of structural variants within MACROD2 in the pathogenesis of colorectal cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1738.
Collapse
Affiliation(s)
| | | | | | - Bas Stringer
- 3Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Roel A. Glas
- 1Netherlands Cancer Inst., Amsterdam, Netherlands
| | | | | | | | - Edwin Cuppen
- 4Hartwig Medical Foundation, Amsterdam, Netherlands
| | - Melissa L. Smith
- 5Icahn Institute of Data Science and Genomics Technology; Icahn School of Medicine at Mount Sinai, New York, NY
| | - Robert P. Sebra
- 5Icahn Institute of Data Science and Genomics Technology; Icahn School of Medicine at Mount Sinai, New York, NY
| | | | | | | | - Jaap Heringa
- 3Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | | | - Sanne Abeln
- 3Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
18
|
Amini S, Jacobsen A, Ivanova O, Lijnzaad P, Heringa J, Holstege FCP, Feenstra KA, Kemmeren P. The ability of transcription factors to differentially regulate gene expression is a crucial component of the mechanism underlying inversion, a frequently observed genetic interaction pattern. PLoS Comput Biol 2019; 15:e1007061. [PMID: 31083661 PMCID: PMC6532943 DOI: 10.1371/journal.pcbi.1007061] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 05/23/2019] [Accepted: 04/30/2019] [Indexed: 12/21/2022] Open
Abstract
Genetic interactions, a phenomenon whereby combinations of mutations lead to unexpected effects, reflect how cellular processes are wired and play an important role in complex genetic diseases. Understanding the molecular basis of genetic interactions is crucial for deciphering pathway organization as well as understanding the relationship between genetic variation and disease. Several hypothetical molecular mechanisms have been linked to different genetic interaction types. However, differences in genetic interaction patterns and their underlying mechanisms have not yet been compared systematically between different functional gene classes. Here, differences in the occurrence and types of genetic interactions are compared for two classes, gene-specific transcription factors (GSTFs) and signaling genes (kinases and phosphatases). Genome-wide gene expression data for 63 single and double deletion mutants in baker's yeast reveals that the two most common genetic interaction patterns are buffering and inversion. Buffering is typically associated with redundancy and is well understood. In inversion, genes show opposite behavior in the double mutant compared to the corresponding single mutants. The underlying mechanism is poorly understood. Although both classes show buffering and inversion patterns, the prevalence of inversion is much stronger in GSTFs. To decipher potential mechanisms, a Petri Net modeling approach was employed, where genes are represented as nodes and relationships between genes as edges. This allowed over 9 million possible three and four node models to be exhaustively enumerated. The models show that a quantitative difference in interaction strength is a strict requirement for obtaining inversion. In addition, this difference is frequently accompanied with a second gene that shows buffering. Taken together, these results provide a mechanistic explanation for inversion. Furthermore, the ability of transcription factors to differentially regulate expression of their targets provides a likely explanation why inversion is more prevalent for GSTFs compared to kinases and phosphatases.
Collapse
Affiliation(s)
- Saman Amini
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Center for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Annika Jacobsen
- Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Olga Ivanova
- Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Philip Lijnzaad
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | | | - K. Anton Feenstra
- Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
- Center for Molecular Medicine, University Medical Centre Utrecht, Utrecht, The Netherlands
- * E-mail:
| |
Collapse
|
19
|
Dijkstra M, Bawono P, Abeln S, Feenstra KA, Fokkink W, Heringa J. Motif-Aware PRALINE: Improving the alignment of motif regions. PLoS Comput Biol 2018; 14:e1006547. [PMID: 30383764 PMCID: PMC6233922 DOI: 10.1371/journal.pcbi.1006547] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 11/13/2018] [Accepted: 10/05/2018] [Indexed: 11/21/2022] Open
Abstract
Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems. The most important functional parts of proteins are often small—but very specific—sequence motifs. Moreover, these motifs tend to be strongly conserved during evolution due to their functional role. Nevertheless, when trying to align protein sequences of the same family, it is often very difficult to align such motifs using standard multiple sequence alignment methods. Aligning functional residues correctly is essential to detect motif conservation, which can be used to filter out spuriously occurring motifs. Additionally, many downstream analyses, such as phylogenetics, are strongly reliant on alignment quality. We have developed a sequence alignment program named Motif-Aware PRALINE (MA-PRALINE) that incorporates information about motifs explicitly. Motifs are provided to MA-PRALINE in the PROSITE pattern syntax; it then scans the input sequences for instances of the pattern and provides a score bonus to matching sequence positions. Our method provides a reproducible alternative to editing alignments by hand in order to account for motif conservation, which is a tedious and error-prone process. We will show that MA-PRALINE allows the alignment of motif-rich regions to be fine-tuned while not degrading the rest of the alignment. MA-PRALINE is available on GitHub as open source software; this allows it to be easily tailored to similar problems. We apply MA-PRALINE on the HIV-1 envelope glycoprotein (gp120) to get an improved alignment of the N-terminal glycosylation motifs. The presence of these motifs is essential for the virus in evading the immune response of the host.
Collapse
Affiliation(s)
- Maurits Dijkstra
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- * E-mail:
| | - Punto Bawono
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sanne Abeln
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - K. Anton Feenstra
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Wan Fokkink
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
20
|
Dijkstra M, Fokkink W, Heringa J, van Dijk E, Abeln S. The characteristics of molten globule states and folding pathways strongly depend on the sequence of a protein. Mol Phys 2018. [DOI: 10.1080/00268976.2018.1496290] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- M.J.J. Dijkstra
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - W.J. Fokkink
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - J. Heringa
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - E. van Dijk
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - S. Abeln
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
21
|
Anton Feenstra K, Abeln S, Westerhuis JA, Brancos dos Santos F, Molenaar D, Teusink B, Hoefsloot HCJ, Heringa J. Training for translation between disciplines: a philosophy for life and data sciences curricula. Bioinformatics 2018; 34:i4-i12. [PMID: 29950011 PMCID: PMC6022589 DOI: 10.1093/bioinformatics/bty233] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Motivation Our society has become data-rich to the extent that research in many areas has become impossible without computational approaches. Educational programmes seem to be lagging behind this development. At the same time, there is a growing need not only for strong data science skills, but foremost for the ability to both translate between tools and methods on the one hand, and application and problems on the other. Results Here we present our experiences with shaping and running a masters' programme in bioinformatics and systems biology in Amsterdam. From this, we have developed a comprehensive philosophy on how translation in training may be achieved in a dynamic and multidisciplinary research area, which is described here. We furthermore describe two requirements that enable translation, which we have found to be crucial: sufficient depth and focus on multidisciplinary topic areas, coupled with a balanced breadth from adjacent disciplines. Finally, we present concrete suggestions on how this may be implemented in practice, which may be relevant for the effectiveness of life science and data science curricula in general, and of particular interest to those who are in the process of setting up such curricula. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- K Anton Feenstra
- Department of Computer Science, IBIVU Centre for Integrative Bioinformatics Vrije Universiteit Amsterdam, HV Amsterdam, Netherlands
- AIMMS Amsterdam Institute for Molecules, Medicines and Systems, Vrije Universiteit Amsterdam, MC Amsterdam, The Netherlands
| | - Sanne Abeln
- Department of Computer Science, IBIVU Centre for Integrative Bioinformatics Vrije Universiteit Amsterdam, HV Amsterdam, Netherlands
- Amsterdam Data Science, GH Amsterdam, The Netherlands
| | - Johan A Westerhuis
- Swammerdam Institute for Life Sciences, Universiteit van Amsterdam, GE Amsterdam, The Netherlands
| | | | - Douwe Molenaar
- AIMMS Amsterdam Institute for Molecules, Medicines and Systems, Vrije Universiteit Amsterdam, MC Amsterdam, The Netherlands
| | - Bas Teusink
- AIMMS Amsterdam Institute for Molecules, Medicines and Systems, Vrije Universiteit Amsterdam, MC Amsterdam, The Netherlands
- Amsterdam Data Science, GH Amsterdam, The Netherlands
| | - Huub C J Hoefsloot
- Swammerdam Institute for Life Sciences, Universiteit van Amsterdam, GE Amsterdam, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, IBIVU Centre for Integrative Bioinformatics Vrije Universiteit Amsterdam, HV Amsterdam, Netherlands
- AIMMS Amsterdam Institute for Molecules, Medicines and Systems, Vrije Universiteit Amsterdam, MC Amsterdam, The Netherlands
- Amsterdam Data Science, GH Amsterdam, The Netherlands
| |
Collapse
|
22
|
van Gelder CWG, Hooft RWW, van Rijswijk MN, van den Berg L, Kok RG, Reinders M, Mons B, Heringa J. Bioinformatics in the Netherlands: the value of a nationwide community. Brief Bioinform 2018; 19:359. [PMID: 29267862 DOI: 10.1093/bib/bbx171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
23
|
Hou Q, De Geest PFG, Vranken WF, Heringa J, Feenstra KA. Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest. Bioinformatics 2018; 33:1479-1487. [PMID: 28073761 DOI: 10.1093/bioinformatics/btx005] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Accepted: 01/06/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in protein-protein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction. In this paper, we evaluate the importance of various features using Random Forest (RF), and include as a novel feature backbone flexibility predicted from sequences to further optimise protein interface prediction. Results We observe that there is no single sequence feature that enables pinpointing interacting sites in our Random Forest models. However, combining different properties does increase the performance of interface prediction. Our homomeric-trained RF interface predictor is able to distinguish interface from non-interface residues with an area under the ROC curve of 0.72 in a homomeric test-set. The heteromeric-trained RF interface predictor performs better than existing predictors on a independent heteromeric test-set. We trained a more general predictor on the combined homomeric and heteromeric dataset, and show that in addition to predicting homomeric interfaces, it is also able to pinpoint interface residues in heterodimers. This suggests that our random forest model and the features included capture common properties of both homodimer and heterodimer interfaces. Availability and Implementation The predictors and test datasets used in our analyses are freely available ( http://www.ibi.vu.nl/downloads/RF_PPI/ ). Contact k.a.feenstra@vu.nl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingzhen Hou
- Center for Integrative Bioinformatics VU (IBIVU), Amsterdam, HV, The Netherlands.,Amsterdam Institute for Molecules Medicines and Systems (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, HV, The Netherlands
| | - Paul F G De Geest
- Center for Integrative Bioinformatics VU (IBIVU), Amsterdam, HV, The Netherlands.,Amsterdam Institute for Molecules Medicines and Systems (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, HV, The Netherlands
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel,Brussels, Belgium.,Structural Biology Research Centre, VIB, Brussels, Belgium
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), Amsterdam, HV, The Netherlands.,Amsterdam Institute for Molecules Medicines and Systems (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, HV, The Netherlands
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Amsterdam, HV, The Netherlands.,Amsterdam Institute for Molecules Medicines and Systems (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, HV, The Netherlands
| |
Collapse
|
24
|
Zhang C, Bijlard J, Staiger C, Scollen S, van Enckevort D, Hoogstrate Y, Senf A, Hiltemann S, Repo S, Pipping W, Bierkens M, Payralbe S, Stringer B, Heringa J, Stubbs A, Bonino Da Silva Santos LO, Belien J, Weistra W, Azevedo R, van Bochove K, Meijer G, Boiten JW, Rambla J, Fijneman R, Spalding JD, Abeln S. Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data. F1000Res 2017; 6. [PMID: 29123641 PMCID: PMC5657030 DOI: 10.12688/f1000research.12168.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/14/2017] [Indexed: 01/11/2023] Open
Abstract
The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data.
Collapse
Affiliation(s)
- Chao Zhang
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, 1081 HV, Netherlands
| | - Jochem Bijlard
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, 1081 HV, Netherlands.,The Hyve, Utrecht, 3511 MJ, Netherlands
| | | | | | - David van Enckevort
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, 9712 CP, Netherlands
| | - Youri Hoogstrate
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, 3015 CE, Netherlands
| | | | - Saskia Hiltemann
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, 3015 CE, Netherlands
| | | | | | | | | | - Bas Stringer
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, 1081 HV, Netherlands
| | - Jaap Heringa
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, 1081 HV, Netherlands
| | - Andrew Stubbs
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, 3015 CE, Netherlands
| | | | - Jeroen Belien
- Department of Pathology, VU University Medical Center Amsterdam, Amsterdam, 1081 HV, Netherlands
| | | | | | | | - Gerrit Meijer
- Netherlands Cancer Institute, Amsterdam, 1066 CX, Netherlands
| | | | - Jordi Rambla
- Centre for Genomic Regulation (CRG), Barcelona, 08003, Spain
| | - Remond Fijneman
- Netherlands Cancer Institute, Amsterdam, 1066 CX, Netherlands
| | | | - Sanne Abeln
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, 1081 HV, Netherlands
| |
Collapse
|
25
|
Haydarlou R, Jacobsen A, Bonzanni N, Feenstra KA, Abeln S, Heringa J. BioASF: a framework for automatically generating executable pathway models specified in BioPAX. Bioinformatics 2017; 32:i60-i69. [PMID: 27307645 PMCID: PMC4908334 DOI: 10.1093/bioinformatics/btw250] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Motivation: Biological pathways play a key role in most cellular functions. To better understand these functions, diverse computational and cell biology researchers use biological pathway data for various analysis and modeling purposes. For specifying these biological pathways, a community of researchers has defined BioPAX and provided various tools for creating, validating and visualizing BioPAX models. However, a generic software framework for simulating BioPAX models is missing. Here, we attempt to fill this gap by introducing a generic simulation framework for BioPAX. The framework explicitly separates the execution model from the model structure as provided by BioPAX, with the advantage that the modelling process becomes more reproducible and intrinsically more modular; this ensures natural biological constraints are satisfied upon execution. The framework is based on the principles of discrete event systems and multi-agent systems, and is capable of automatically generating a hierarchical multi-agent system for a given BioPAX model. Results: To demonstrate the applicability of the framework, we simulated two types of biological network models: a gene regulatory network modeling the haematopoietic stem cell regulators and a signal transduction network modeling the Wnt/β-catenin signaling pathway. We observed that the results of the simulations performed using our framework were entirely consistent with the simulation results reported by the researchers who developed the original models in a proprietary language. Availability and Implementation: The framework, implemented in Java, is open source and its source code, documentation and tutorial are available at http://www.ibi.vu.nl/programs/BioASF. Contact:j.heringa@vu.nl
Collapse
Affiliation(s)
- Reza Haydarlou
- Centre for Integrative Bioinformatics (IBIVU) & Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, De Boelelaan 1081, Amsterdam, The Netherlands
| | - Annika Jacobsen
- Centre for Integrative Bioinformatics (IBIVU) & Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, De Boelelaan 1081, Amsterdam, The Netherlands
| | - Nicola Bonzanni
- Centre for Integrative Bioinformatics (IBIVU) & Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, De Boelelaan 1081, Amsterdam, The Netherlands NKI-AVL, The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, The Netherlands
| | - K Anton Feenstra
- Centre for Integrative Bioinformatics (IBIVU) & Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, De Boelelaan 1081, Amsterdam, The Netherlands
| | - Sanne Abeln
- Centre for Integrative Bioinformatics (IBIVU) & Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, De Boelelaan 1081, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU) & Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, De Boelelaan 1081, Amsterdam, The Netherlands
| |
Collapse
|
26
|
van Gool AJ, Bietrix F, Caldenhoven E, Zatloukal K, Scherer A, Litton JE, Meijer G, Blomberg N, Smith A, Mons B, Heringa J, Koot WJ, Smit MJ, Hajduch M, Rijnders T, Ussi A. Bridging the translational innovation gap through good biomarker practice. Nat Rev Drug Discov 2017; 16:587-588. [DOI: 10.1038/nrd.2017.72] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
27
|
Hoogstrate Y, Zhang C, Senf A, Bijlard J, Hiltemann S, van Enckevort D, Repo S, Heringa J, Jenster G, J A Fijneman R, Boiten JW, A Meijer G, Stubbs A, Rambla J, Spalding D, Abeln S. Integration of EGA secure data access into Galaxy. F1000Res 2017; 5. [PMID: 28232859 PMCID: PMC5302147 DOI: 10.12688/f1000research.10221.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/30/2016] [Indexed: 12/31/2022] Open
Abstract
High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure
http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed:
https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.
Collapse
Affiliation(s)
- Youri Hoogstrate
- Department of Bioinformatics, ErasmusMC Rotterdam, Rotterdam, Netherlands
| | - Chao Zhang
- Department of Computer Science, Vrije Universiteit, Amsterdam, Netherlands
| | - Alexander Senf
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Saskia Hiltemann
- Department of Bioinformatics, ErasmusMC Rotterdam, Rotterdam, Netherlands
| | | | | | - Jaap Heringa
- Department of Computer Science, Vrije Universiteit, Amsterdam, Netherlands
| | - Guido Jenster
- Department of Urology, ErasmusMC Rotterdam, Rotterdam, Netherlands
| | | | | | - Gerrit A Meijer
- Diagnostic Oncology, Netherlands Cancer Institute, Amsterdam, Netherlands
| | - Andrew Stubbs
- Department of Bioinformatics, ErasmusMC Rotterdam, Rotterdam, Netherlands
| | - Jordi Rambla
- Centre for Genomic Regulation, Parc de Recerca Biomédica de Barcelona, Barcelona, Spain
| | - Dylan Spalding
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Sanne Abeln
- Department of Computer Science, Vrije Universiteit, Amsterdam, Netherlands
| |
Collapse
|
28
|
Palma A, Tinti M, Paoluzi S, Santonico E, Brandt BW, Hooft van Huijsduijnen R, Masch A, Heringa J, Schutkowski M, Castagnoli L, Cesareni G. Both Intrinsic Substrate Preference and Network Context Contribute to Substrate Selection of Classical Tyrosine Phosphatases. J Biol Chem 2017; 292:4942-4952. [PMID: 28159843 DOI: 10.1074/jbc.m116.757518] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 01/31/2017] [Indexed: 01/19/2023] Open
Abstract
Reversible tyrosine phosphorylation is a widespread post-translational modification mechanism underlying cell physiology. Thus, understanding the mechanisms responsible for substrate selection by kinases and phosphatases is central to our ability to model signal transduction at a system level. Classical protein-tyrosine phosphatases can exhibit substrate specificity in vivo by combining intrinsic enzymatic specificity with the network of protein-protein interactions, which positions the enzymes in close proximity to their substrates. Here we use a high throughput approach, based on high density phosphopeptide chips, to determine the in vitro substrate preference of 16 members of the protein-tyrosine phosphatase family. This approach helped identify one residue in the substrate binding pocket of the phosphatase domain that confers specificity for phosphopeptides in a specific sequence context. We also present a Bayesian model that combines intrinsic enzymatic specificity and interaction information in the context of the human protein interaction network to infer new phosphatase substrates at the proteome level.
Collapse
Affiliation(s)
- Anita Palma
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Michele Tinti
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Serena Paoluzi
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Elena Santonico
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Bernd Willem Brandt
- the Centre for Integrative Bioinformatics, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands, and
| | | | - Antonia Masch
- the Institut für Biochemie & Biotechnologie, Martin-Luther-Universität Halle-Wittenberg, 06108 Halle, Germany
| | - Jaap Heringa
- the Centre for Integrative Bioinformatics, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands, and
| | - Mike Schutkowski
- the Institut für Biochemie & Biotechnologie, Martin-Luther-Universität Halle-Wittenberg, 06108 Halle, Germany
| | - Luisa Castagnoli
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Gianni Cesareni
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy,
| |
Collapse
|
29
|
Rajendran R, May A, Sherry L, Kean R, Williams C, Jones BL, Burgess KV, Heringa J, Abeln S, Brandt BW, Munro CA, Ramage G. Integrating Candida albicans metabolism with biofilm heterogeneity by transcriptome mapping. Sci Rep 2016; 6:35436. [PMID: 27765942 PMCID: PMC5073228 DOI: 10.1038/srep35436] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 09/29/2016] [Indexed: 12/20/2022] Open
Abstract
Candida albicans biofilm formation is an important virulence factor in the pathogenesis of disease, a characteristic which has been shown to be heterogeneous in clinical isolates. Using an unbiased computational approach we investigated the central metabolic pathways driving biofilm heterogeneity. Transcripts from high (HBF) and low (LBF) biofilm forming isolates were analysed by RNA sequencing, with 6312 genes identified to be expressed in these two phenotypes. With a dedicated computational approach we identified and validated a significantly differentially expressed subnetwork of genes associated with these biofilm phenotypes. Our analysis revealed amino acid metabolism, such as arginine, proline, aspartate and glutamate metabolism, were predominantly upregulated in the HBF phenotype. On the contrary, purine, starch and sucrose metabolism was generally upregulated in the LBF phenotype. The aspartate aminotransferase gene AAT1 was found to be a common member of these amino acid pathways and significantly upregulated in the HBF phenotype. Pharmacological inhibition of AAT1 enzyme activity significantly reduced biofilm formation in a dose-dependent manner. Collectively, these findings provide evidence that biofilm phenotype is associated with differential regulation of metabolic pathways. Understanding and targeting such pathways, such as amino acid metabolism, is potentially useful for developing diagnostics and new antifungals to treat biofilm-based infections.
Collapse
Affiliation(s)
- Ranjith Rajendran
- School of Medicine, College of Medical, Veterinary and Life Sciences (MVLS), University of Glasgow, UK
| | - Ali May
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, The Netherlands.,Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, The Netherlands
| | - Leighann Sherry
- School of Medicine, College of Medical, Veterinary and Life Sciences (MVLS), University of Glasgow, UK
| | - Ryan Kean
- School of Medicine, College of Medical, Veterinary and Life Sciences (MVLS), University of Glasgow, UK.,Institute of Healthcare Associated Infection, School of Health, Nursing and Midwifery, University of the West of Scotland, UK
| | - Craig Williams
- Institute of Healthcare Associated Infection, School of Health, Nursing and Midwifery, University of the West of Scotland, UK
| | - Brian L Jones
- Microbiology Department, Glasgow Royal Infirmary, Glasgow, UK
| | | | - Jaap Heringa
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, The Netherlands
| | - Sanne Abeln
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, The Netherlands
| | - Bernd W Brandt
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, The Netherlands
| | - Carol A Munro
- Aberdeen Fungal Group, MRC Centre for Medical Mycology, University of Aberdeen, UK
| | - Gordon Ramage
- School of Medicine, College of Medical, Veterinary and Life Sciences (MVLS), University of Glasgow, UK
| |
Collapse
|
30
|
Affiliation(s)
| | | | | | - Sanne Abeln
- Vrije Universiteit, Amsterdam, The Netherlands
| | | |
Collapse
|
31
|
Hou Q, Lensink MF, Heringa J, Feenstra KA. CLUB-MARTINI: Selecting Favourable Interactions amongst Available Candidates, a Coarse-Grained Simulation Approach to Scoring Docking Decoys. PLoS One 2016; 11:e0155251. [PMID: 27166787 PMCID: PMC4864233 DOI: 10.1371/journal.pone.0155251] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 04/26/2016] [Indexed: 01/12/2023] Open
Abstract
Large-scale identification of native binding orientations is crucial for understanding the role of protein-protein interactions in their biological context. Measuring binding free energy is the method of choice to estimate binding strength and reveal the relevance of particular conformations in which proteins interact. In a recent study, we successfully applied coarse-grained molecular dynamics simulations to measure binding free energy for two protein complexes with similar accuracy to full-atomistic simulation, but 500-fold less time consuming. Here, we investigate the efficacy of this approach as a scoring method to identify stable binding conformations from thousands of docking decoys produced by protein docking programs. To test our method, we first applied it to calculate binding free energies of all protein conformations in a CAPRI (Critical Assessment of PRedicted Interactions) benchmark dataset, which included over 19000 protein docking solutions for 15 benchmark targets. Based on the binding free energies, we ranked all docking solutions to select the near-native binding modes under the assumption that the native-solutions have lowest binding free energies. In our top 100 ranked structures, for the ‘easy’ targets that have many near-native conformations, we obtain a strong enrichment of acceptable or better quality structures; for the ‘hard’ targets without near-native decoys, our method is still able to retain structures which have native binding contacts. Moreover, in our top 10 selections, CLUB-MARTINI shows a comparable performance when compared with other state-of-the-art docking scoring functions. As a proof of concept, CLUB-MARTINI performs remarkably well for many targets and is able to pinpoint near-native binding modes in the top selections. To the best of our knowledge, this is the first time interaction free energy calculated from MD simulations have been used to rank docking solutions at a large scale.
Collapse
Affiliation(s)
- Qingzhen Hou
- Center for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
| | - Marc F. Lensink
- University Lille, CNRS, UMR8576 UGSF - Institute for Structural and Functional Glycobiology, F-59000, Lille, France
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
| | - K. Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
- * E-mail:
| |
Collapse
|
32
|
Lelieveld SH, Schütte J, Dijkstra MJJ, Bawono P, Kinston SJ, Göttgens B, Heringa J, Bonzanni N. ConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites. Nucleic Acids Res 2016; 44:e72. [PMID: 26721389 PMCID: PMC4856970 DOI: 10.1093/nar/gkv1518] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Revised: 12/15/2015] [Accepted: 12/16/2015] [Indexed: 12/23/2022] Open
Abstract
Eukaryotic gene expression is regulated by transcription factors (TFs) binding to promoter as well as distal enhancers. TFs recognize short, but specific binding sites (TFBSs) that are located within the promoter and enhancer regions. Functionally relevant TFBSs are often highly conserved during evolution leaving a strong phylogenetic signal. While multiple sequence alignment (MSA) is a potent tool to detect the phylogenetic signal, the current MSA implementations are optimized to align the maximum number of identical nucleotides. This approach might result in the omission of conserved motifs that contain interchangeable nucleotides such as the ETS motif (IUPAC code: GGAW). Here, we introduce ConBind, a novel method to enhance alignment of short motifs, even if their mutual sequence similarity is only partial. ConBind improves the identification of conserved TFBSs by improving the alignment accuracy of TFBS families within orthologous DNA sequences. Functional validation of the Gfi1b + 13 enhancer reveals that ConBind identifies additional functionally important ETS binding sites that were missed by all other tested alignment tools. In addition to the analysis of known regulatory regions, our web tool is useful for the analysis of TFBSs on so far unknown DNA regions identified through ChIP-sequencing.
Collapse
Affiliation(s)
- Stefan H Lelieveld
- Centre for Integrative Bioinformatics VU, VU University Amsterdam, Amsterdam 1081 HV, The Netherlands Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| | - Judith Schütte
- Department of Haematology, Wellcome Trust-MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Cambridge University, Cambridge CB2 0XY, UK Klinik für Hämatologie, Universitätsklinik Essen 45147, Germany
| | - Maurits J J Dijkstra
- Centre for Integrative Bioinformatics VU, VU University Amsterdam, Amsterdam 1081 HV, The Netherlands
| | - Punto Bawono
- Centre for Integrative Bioinformatics VU, VU University Amsterdam, Amsterdam 1081 HV, The Netherlands
| | - Sarah J Kinston
- Department of Haematology, Wellcome Trust-MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Cambridge University, Cambridge CB2 0XY, UK
| | - Berthold Göttgens
- Department of Haematology, Wellcome Trust-MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Cambridge University, Cambridge CB2 0XY, UK
| | - Jaap Heringa
- Centre for Integrative Bioinformatics VU, VU University Amsterdam, Amsterdam 1081 HV, The Netherlands
| | - Nicola Bonzanni
- Centre for Integrative Bioinformatics VU, VU University Amsterdam, Amsterdam 1081 HV, The Netherlands Computational Cancer Biology Group, Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam 1066 CX, The Netherlands ENPICOM, Eindhoven 5632 CW, The Netherlands
| |
Collapse
|
33
|
Hou Q, Dutilh BE, Huynen MA, Heringa J, Feenstra KA. Sequence specificity between interacting and non-interacting homologs identifies interface residues--a homodimer and monomer use case. BMC Bioinformatics 2015; 16:325. [PMID: 26449222 PMCID: PMC4599308 DOI: 10.1186/s12859-015-0758-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 09/30/2015] [Indexed: 11/17/2022] Open
Abstract
Background Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites. Results We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9 % recall and up to 25.1 % precision. Conclusions To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0758-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qingzhen Hou
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands. .,Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands. .,Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands.
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
34
|
El-Kebir M, Soueidan H, Hume T, Beisser D, Dittrich M, Müller T, Blin G, Heringa J, Nikolski M, Wessels LFA, Klau GW. xHeinz: an algorithm for mining cross-species network modules under a flexible conservation model. Bioinformatics 2015; 31:3147-55. [PMID: 26023104 DOI: 10.1093/bioinformatics/btv316] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 05/18/2015] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Integrative network analysis methods provide robust interpretations of differential high-throughput molecular profile measurements. They are often used in a biomedical context-to generate novel hypotheses about the underlying cellular processes or to derive biomarkers for classification and subtyping. The underlying molecular profiles are frequently measured and validated on animal or cellular models. Therefore the results are not immediately transferable to human. In particular, this is also the case in a study of the recently discovered interleukin-17 producing helper T cells (Th17), which are fundamental for anti-microbial immunity but also known to contribute to autoimmune diseases. RESULTS We propose a mathematical model for finding active subnetwork modules that are conserved between two species. These are sets of genes, one for each species, which (i) induce a connected subnetwork in a species-specific interaction network, (ii) show overall differential behavior and (iii) contain a large number of orthologous genes. We propose a flexible notion of conservation, which turns out to be crucial for the quality of the resulting modules in terms of biological interpretability. We propose an algorithm that finds provably optimal or near-optimal conserved active modules in our model. We apply our algorithm to understand the mechanisms underlying Th17 T cell differentiation in both mouse and human. As a main biological result, we find that the key regulation of Th17 differentiation is conserved between human and mouse. AVAILABILITY AND IMPLEMENTATION xHeinz, an implementation of our algorithm, as well as all input data and results, are available at http://software.cwi.nl/xheinz and as a Galaxy service at http://services.cbib.u-bordeaux2.fr/galaxy in CBiB Tools. CONTACT gunnar.klau@cwi.nl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mohammed El-Kebir
- Life Sciences, Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU, VU University Amsterdam, The Netherlands, Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - Hayssam Soueidan
- Computational Cancer Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Thomas Hume
- Univ. Bordeaux, CBiB, 33000 Bordeaux, France, Univ. Bordeaux, CNRS/LaBRI, 33405 Talence, France
| | - Daniela Beisser
- Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen
| | - Marcus Dittrich
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany, Institute of Human Genetics, University of Würzburg, Würzburg, Germany and
| | - Tobias Müller
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | | | - Jaap Heringa
- Centre for Integrative Bioinformatics VU, VU University Amsterdam, The Netherlands
| | - Macha Nikolski
- Univ. Bordeaux, CBiB, 33000 Bordeaux, France, Univ. Bordeaux, CNRS/LaBRI, 33405 Talence, France
| | - Lodewyk F A Wessels
- Computational Cancer Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Gunnar W Klau
- Life Sciences, Centrum Wiskunde & Informatica (CWI), Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU, VU University Amsterdam, The Netherlands, Erable Team, INRIA, Lyon, France
| |
Collapse
|
35
|
May A, Brandt BW, El-Kebir M, Klau GW, Zaura E, Crielaard W, Heringa J, Abeln S. metaModules identifies key functional subnetworks in microbiome-related disease. Bioinformatics 2015; 32:1678-85. [PMID: 26342232 DOI: 10.1093/bioinformatics/btv526] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 09/02/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The human microbiome plays a key role in health and disease. Thanks to comparative metatranscriptomics, the cellular functions that are deregulated by the microbiome in disease can now be computationally explored. Unlike gene-centric approaches, pathway-based methods provide a systemic view of such functions; however, they typically consider each pathway in isolation and in its entirety. They can therefore overlook the key differences that (i) span multiple pathways, (ii) contain bidirectionally deregulated components, (iii) are confined to a pathway region. To capture these properties, computational methods that reach beyond the scope of predefined pathways are needed. RESULTS By integrating an existing module discovery algorithm into comparative metatranscriptomic analysis, we developed metaModules, a novel computational framework for automated identification of the key functional differences between health- and disease-associated communities. Using this framework, we recovered significantly deregulated subnetworks that were indeed recognized to be involved in two well-studied, microbiome-mediated oral diseases, such as butanoate production in periodontal disease and metabolism of sugar alcohols in dental caries. More importantly, our results indicate that our method can be used for hypothesis generation based on automated discovery of novel, disease-related functional subnetworks, which would otherwise require extensive and laborious manual assessment. AVAILABILITY AND IMPLEMENTATION metaModules is available at https://bitbucket.org/alimay/metamodules/ CONTACT a.may@vu.nl or s.abeln@vu.nl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ali May
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands, Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, Amsterdam, The Netherlands
| | - Bernd W Brandt
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| | - Mohammed El-Kebir
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands, Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, USA and Life Sciences, Centre for Mathematics and Computer Science (CWI), Amsterdam, The Netherlands
| | - Gunnar W Klau
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands, Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, Amsterdam, The Netherlands, Life Sciences, Centre for Mathematics and Computer Science (CWI), Amsterdam, The Netherlands
| | - Egija Zaura
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| | - Wim Crielaard
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands, Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, Amsterdam, The Netherlands
| | - Sanne Abeln
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands, Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
36
|
Bawono P, van der Velde A, Abeln S, Heringa J. Quantifying the displacement of mismatches in multiple sequence alignment benchmarks. PLoS One 2015; 10:e0127431. [PMID: 25993129 PMCID: PMC4438059 DOI: 10.1371/journal.pone.0127431] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/14/2015] [Indexed: 11/18/2022] Open
Abstract
Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.
Collapse
Affiliation(s)
- Punto Bawono
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
- * E-mail: (PB); (JH)
| | - Arjan van der Velde
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Sanne Abeln
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
- Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, Amsterdam, The Netherlands
- * E-mail: (PB); (JH)
| |
Collapse
|
37
|
May A, Abeln S, Buijs MJ, Heringa J, Crielaard W, Brandt BW. NGS-eval: NGS Error analysis and novel sequence VAriant detection tooL. Nucleic Acids Res 2015; 43:W301-5. [PMID: 25878034 PMCID: PMC4489229 DOI: 10.1093/nar/gkv346] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 04/03/2015] [Indexed: 02/04/2023] Open
Abstract
Massively parallel sequencing of microbial genetic markers (MGMs) is used to uncover the species composition in a multitude of ecological niches. These sequencing runs often contain a sample with known composition that can be used to evaluate the sequencing quality or to detect novel sequence variants. With NGS-eval, the reads from such (mock) samples can be used to (i) explore the differences between the reads and their references and to (ii) estimate the sequencing error rate. This tool maps these reads to references and calculates as well as visualizes the different types of sequencing errors. Clearly, sequencing errors can only be accurately calculated if the reference sequences are correct. However, even with known strains, it is not straightforward to select the correct references from databases. We previously analysed a pyrosequencing dataset from a mock sample to estimate sequencing error rates and detected sequence variants in our mock community, allowing us to obtain an accurate error estimation. Here, we demonstrate the variant detection and error analysis capability of NGS-eval with Illumina MiSeq reads from the same mock community. While tailored towards the field of metagenomics, this server can be used for any type of MGM-based reads. NGS-eval is available at http://www.ibi.vu.nl/programs/ngsevalwww/.
Collapse
Affiliation(s)
- Ali May
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Sanne Abeln
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands
| | - Mark J Buijs
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands
| | - Wim Crielaard
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| | - Bernd W Brandt
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
38
|
El-Kebir M, Brandt BW, Heringa J, Klau GW. NatalieQ: a web server for protein-protein interaction network querying. BMC Syst Biol 2014; 8:40. [PMID: 24690407 PMCID: PMC3998945 DOI: 10.1186/1752-0509-8-40] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2013] [Accepted: 03/20/2014] [Indexed: 01/17/2023]
Abstract
BACKGROUND Molecular interactions need to be taken into account to adequately model the complex behavior of biological systems. These interactions are captured by various types of biological networks, such as metabolic, gene-regulatory, signal transduction and protein-protein interaction networks. We recently developed Natalie, which computes high-quality network alignments via advanced methods from combinatorial optimization. RESULTS Here, we present NatalieQ, a web server for topology-based alignment of a specified query protein-protein interaction network to a selected target network using the Natalie algorithm. By incorporating similarity at both the sequence and the network level, we compute alignments that allow for the transfer of functional annotation as well as for the prediction of missing interactions. We illustrate the capabilities of NatalieQ with a biological case study involving the Wnt signaling pathway. CONCLUSIONS We show that topology-based network alignment can produce results complementary to those obtained by using sequence similarity alone. We also demonstrate that NatalieQ is able to predict putative interactions. The server is available at: http://www.ibi.vu.nl/programs/natalieq/.
Collapse
Affiliation(s)
- Mohammed El-Kebir
- Life Sciences, Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, the Netherlands.
| | | | | | | |
Collapse
|
39
|
El-Kebir M, Marschall T, Wohlers I, Patterson M, Heringa J, Schönhuth A, Klau GW. Mapping proteins in the presence of paralogs using units of coevolution. BMC Bioinformatics 2014; 14 Suppl 15:S18. [PMID: 24564758 PMCID: PMC3852051 DOI: 10.1186/1471-2105-14-s15-s18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Results Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. Conclusion CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible.
Collapse
|
40
|
May A, Abeln S, Crielaard W, Heringa J, Brandt BW. Unraveling the outcome of 16S rDNA-based taxonomy analysis through mock data and simulations. ACTA ACUST UNITED AC 2014; 30:1530-8. [PMID: 24519382 DOI: 10.1093/bioinformatics/btu085] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
MOTIVATION 16S rDNA pyrosequencing is a powerful approach that requires extensive usage of computational methods for delineating microbial compositions. Previously, it was shown that outcomes of studies relying on this approach vastly depend on the choice of pre-processing and clustering algorithms used. However, obtaining insights into the effects and accuracy of these algorithms is challenging due to difficulties in generating samples of known composition with high enough diversity. Here, we use in silico microbial datasets to better understand how the experimental data are transformed into taxonomic clusters by computational methods. RESULTS We were able to qualitatively replicate the raw experimental pyrosequencing data after rigorously adjusting existing simulation software. This allowed us to simulate datasets of real-life complexity, which we used to assess the influence and performance of two widely used pre-processing methods along with 11 clustering algorithms. We show that the choice, order and mode of the pre-processing methods have a larger impact on the accuracy of the clustering pipeline than the clustering methods themselves. Without pre-processing, the difference between the performances of clustering methods is large. Depending on the clustering algorithm, the most optimal analysis pipeline resulted in significant underestimations of the expected number of clusters (minimum: 3.4%; maximum: 13.6%), allowing us to make quantitative estimations of the bacterial complexity of real microbiome samples.
Collapse
Affiliation(s)
- Ali May
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The NetherlandsDepartment of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
| | - Sanne Abeln
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The NetherlandsDepartment of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
| | - Wim Crielaard
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
| | - Jaap Heringa
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The NetherlandsDepartment of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The NetherlandsDepartment of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
| | - Bernd W Brandt
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands, Centre for Integrative Bioinformatics VU and AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands and NBIC Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
| |
Collapse
|
41
|
Abstract
Profile ALIgNmEnt (PRALINE) is a versatile multiple sequence alignment toolkit. In its main alignment protocol, PRALINE follows the global progressive alignment algorithm. It provides various alignment optimization strategies to address the different situations that call for protein multiple sequence alignment: global profile preprocessing, homology-extended alignment, secondary structure-guided alignment, and transmembrane aware alignment. A number of combinations of these strategies are enabled as well. PRALINE is accessible via the online server http://www.ibi.vu.nl/programs/PRALINEwww/. The server facilitates extensive visualization possibilities aiding the interpretation of alignments generated, which can be written out in pdf format for publication purposes. PRALINE also allows the sequences in the alignment to be represented in a dendrogram to show their mutual relationships according to the alignment. The chapter ends with a discussion of various issues occurring in multiple sequence alignment.
Collapse
Affiliation(s)
- Punto Bawono
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | | |
Collapse
|
42
|
Bonzanni N, Garg A, Feenstra KA, Schütte J, Kinston S, Miranda-Saavedra D, Heringa J, Xenarios I, Göttgens B. Hard-wired heterogeneity in blood stem cells revealed using a dynamic regulatory network model. Bioinformatics 2013; 29:i80-8. [PMID: 23813012 PMCID: PMC3694641 DOI: 10.1093/bioinformatics/btt243] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Motivation: Combinatorial interactions of transcription factors with cis-regulatory elements control the dynamic progression through successive cellular states and thus underpin all metazoan development. The construction of network models of cis-regulatory elements, therefore, has the potential to generate fundamental insights into cellular fate and differentiation. Haematopoiesis has long served as a model system to study mammalian differentiation, yet modelling based on experimentally informed cis-regulatory interactions has so far been restricted to pairs of interacting factors. Here, we have generated a Boolean network model based on detailed cis-regulatory functional data connecting 11 haematopoietic stem/progenitor cell (HSPC) regulator genes. Results: Despite its apparent simplicity, the model exhibits surprisingly complex behaviour that we charted using strongly connected components and shortest-path analysis in its Boolean state space. This analysis of our model predicts that HSPCs display heterogeneous expression patterns and possess many intermediate states that can act as ‘stepping stones’ for the HSPC to achieve a final differentiated state. Importantly, an external perturbation or ‘trigger’ is required to exit the stem cell state, with distinct triggers characterizing maturation into the various different lineages. By focusing on intermediate states occurring during erythrocyte differentiation, from our model we predicted a novel negative regulation of Fli1 by Gata1, which we confirmed experimentally thus validating our model. In conclusion, we demonstrate that an advanced mammalian regulatory network model based on experimentally validated cis-regulatory interactions has allowed us to make novel, experimentally testable hypotheses about transcriptional mechanisms that control differentiation of mammalian stem cells. Contact:j.heringa@vu.nl or ioannis.xenarios@isb-sib.ch or bg200@cam.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicola Bonzanni
- IBIVU Centre for Integrative Bioinformatics, VU University Amsterdam, AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, De Boelelaan 1081, NKI-AVL The Netherlands
| | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Gijsbers EF, Feenstra KA, van Nuenen AC, Navis M, Heringa J, Schuitemaker H, Kootstra NA. HIV-1 replication fitness of HLA-B*57/58:01 CTL escape variants is restored by the accumulation of compensatory mutations in gag. PLoS One 2013; 8:e81235. [PMID: 24339913 PMCID: PMC3855271 DOI: 10.1371/journal.pone.0081235] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 10/10/2013] [Indexed: 11/30/2022] Open
Abstract
Expression of HLA-B*57 and the closely related HLA-B*58:01 are associated with prolonged survival after HIV-1 infection. However, large differences in disease course are observed among HLA-B*57/58:01 patients. Escape mutations in CTL epitopes restricted by these HLA alleles come at a fitness cost and particularly the T242N mutation in the TW10 CTL epitope in Gag has been demonstrated to decrease the viral replication capacity. Additional mutations within or flanking this CTL epitope can partially restore replication fitness of CTL escape variants. Five HLA-B*57/58:01 progressors and 5 HLA-B*57/58:01 long-term nonprogressors (LTNPs) were followed longitudinally and we studied which compensatory mutations were involved in the restoration of the viral fitness of variants that escaped from HLA-B*57/58:01-restricted CTL pressure. The Sequence Harmony algorithm was used to detect homology in amino acid composition by comparing longitudinal Gag sequences obtained from HIV-1 patients positive and negative for HLA-B*57/58:01 and from HLA-B*57/58:01 progressors and LTNPs. Although virus isolates from HLA-B*57/58:01 individuals contained multiple CTL escape mutations, these escape mutations were not associated with disease progression. In sequences from HLA-B*57/58:01 progressors, 5 additional mutations in Gag were observed: S126N, L215T, H219Q, M228I and N252H. The combination of these mutations restored the replication fitness of CTL escape HIV-1 variants. Furthermore, we observed a positive correlation between the number of escape and compensatory mutations in Gag and the replication fitness of biological HIV-1 variants isolated from HLA-B*57/58:01 patients, suggesting that the replication fitness of HLA-B*57/58:01 escape variants is restored by accumulation of compensatory mutations.
Collapse
Affiliation(s)
- Esther F. Gijsbers
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - K. Anton Feenstra
- Centre for Integrative Bioinformatics (IBIVU) and Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), VU University, Amsterdam, The Netherlands
- Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
| | - Ad C. van Nuenen
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Marjon Navis
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU) and Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), VU University, Amsterdam, The Netherlands
- Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
| | - Hanneke Schuitemaker
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Neeltje A. Kootstra
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
- * E-mail:
| |
Collapse
|
44
|
May A, Pool R, van Dijk E, Bijlard J, Abeln S, Heringa J, Feenstra KA. Coarse-grained versus atomistic simulations: realistic interaction free energies for real proteins. ACTA ACUST UNITED AC 2013; 30:326-34. [PMID: 24273239 DOI: 10.1093/bioinformatics/btt675] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To assess whether two proteins will interact under physiological conditions, information on the interaction free energy is needed. Statistical learning techniques and docking methods for predicting protein-protein interactions cannot quantitatively estimate binding free energies. Full atomistic molecular simulation methods do have this potential, but are completely unfeasible for large-scale applications in terms of computational cost required. Here we investigate whether applying coarse-grained (CG) molecular dynamics simulations is a viable alternative for complexes of known structure. RESULTS We calculate the free energy barrier with respect to the bound state based on molecular dynamics simulations using both a full atomistic and a CG force field for the TCR-pMHC complex and the MP1-p14 scaffolding complex. We find that the free energy barriers from the CG simulations are of similar accuracy as those from the full atomistic ones, while achieving a speedup of >500-fold. We also observe that extensive sampling is extremely important to obtain accurate free energy barriers, which is only within reach for the CG models. Finally, we show that the CG model preserves biological relevance of the interactions: (i) we observe a strong correlation between evolutionary likelihood of mutations and the impact on the free energy barrier with respect to the bound state; and (ii) we confirm the dominant role of the interface core in these interactions. Therefore, our results suggest that CG molecular simulations can realistically be used for the accurate prediction of protein-protein interaction strength. AVAILABILITY AND IMPLEMENTATION The python analysis framework and data files are available for download at http://www.ibi.vu.nl/downloads/bioinformatics-2013-btt675.tgz.
Collapse
Affiliation(s)
- Ali May
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Netherlands Bioinformatics Centre (NBIC), Geert Grooteplein 28 6525 GA Nijmegen, The Netherlands and Department of Biological Psychology, VU University Amsterdam, 1081 HV Amsterdam, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
45
|
van den Kerkhof TLGM, Feenstra KA, Euler Z, van Gils MJ, Rijsdijk LWE, Boeser-Nunnink BD, Heringa J, Schuitemaker H, Sanders RW. HIV-1 envelope glycoprotein signatures that correlate with the development of cross-reactive neutralizing activity. Retrovirology 2013; 10:102. [PMID: 24059682 PMCID: PMC3849187 DOI: 10.1186/1742-4690-10-102] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 09/12/2013] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Current HIV-1 envelope glycoprotein (Env) vaccines are unable to induce cross-reactive neutralizing antibodies. However, such antibodies are elicited in 10-30% of HIV-1 infected individuals, but it is unknown why these antibodies are induced in some individuals and not in others. We hypothesized that the Envs of early HIV-1 variants in individuals who develop cross-reactive neutralizing activity (CrNA) might have unique characteristics that support the induction of CrNA. RESULTS We retrospectively generated and analyzed env sequences of early HIV-1 clonal variants from 31 individuals with diverse levels of CrNA 2-4 years post-seroconversion. These sequences revealed a number of Env signatures that coincided with CrNA development. These included a statistically shorter variable region 1 and a lower probability of glycosylation as implied by a high ratio of NXS versus NXT glycosylation motifs. Furthermore, lower probability of glycosylation at position 332, which is involved in the epitopes of many broadly reactive neutralizing antibodies, was associated with the induction of CrNA. Finally, Sequence Harmony identified a number of amino acid changes associated with the development of CrNA. These residues mapped to various Env subdomains, but in particular to the first and fourth variable region as well as the underlying α2 helix of the third constant region. CONCLUSIONS These findings imply that the development of CrNA might depend on specific characteristics of early Env. Env signatures that correlate with the induction of CrNA might be relevant for the design of effective HIV-1 vaccines.
Collapse
Affiliation(s)
- Tom L G M van den Kerkhof
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU) and Amsterdam Institute for Molecules, Medicine and Systems (AIMMS), VU University Amsterdam, 1081 HV Amsterdam, the Netherlands
- Netherlands Bioinformatics Center (NBIC), 6525 GA Nijmegen, the Netherlands
| | - Zelda Euler
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - Marit J van Gils
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
- Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - Linda W E Rijsdijk
- Center for Integrative Bioinformatics VU (IBIVU) and Amsterdam Institute for Molecules, Medicine and Systems (AIMMS), VU University Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Brigitte D Boeser-Nunnink
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU) and Amsterdam Institute for Molecules, Medicine and Systems (AIMMS), VU University Amsterdam, 1081 HV Amsterdam, the Netherlands
- Netherlands Bioinformatics Center (NBIC), 6525 GA Nijmegen, the Netherlands
- Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - Hanneke Schuitemaker
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
- Crucell Holland BV, 2333 CN Leiden, the Netherlands
| | - Rogier W Sanders
- Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
- Department of Microbiology and Immunology, Weill Medical College, Cornell University, New York, NY 10065 USA
| |
Collapse
|
46
|
Hettling H, Alders DJC, Heringa J, Binsl TW, Groeneveld ABJ, van Beek JHGM. Computational estimation of tricarboxylic acid cycle fluxes using noisy NMR data from cardiac biopsies. BMC Syst Biol 2013; 7:82. [PMID: 23965343 PMCID: PMC3765389 DOI: 10.1186/1752-0509-7-82] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 08/15/2013] [Indexed: 11/16/2022]
Abstract
Background The aerobic energy metabolism of cardiac muscle cells is of major importance for the contractile function of the heart. Because energy metabolism is very heterogeneously distributed in heart tissue, especially during coronary disease, a method to quantify metabolic fluxes in small tissue samples is desirable. Taking tissue biopsies after infusion of substrates labeled with stable carbon isotopes makes this possible in animal experiments. However, the appreciable noise level in NMR spectra of extracted tissue samples makes computational estimation of metabolic fluxes challenging and a good method to define confidence regions was not yet available. Results Here we present a computational analysis method for nuclear magnetic resonance (NMR) measurements of tricarboxylic acid (TCA) cycle metabolites. The method was validated using measurements on extracts of single tissue biopsies taken from porcine heart in vivo. Isotopic enrichment of glutamate was measured by NMR spectroscopy in tissue samples taken at a single time point after the timed infusion of 13C labeled substrates for the TCA cycle. The NMR intensities for glutamate were analyzed with a computational model describing carbon transitions in the TCA cycle and carbon exchange with amino acids. The model dynamics depended on five flux parameters, which were optimized to fit the NMR measurements. To determine confidence regions for the estimated fluxes, we used the Metropolis-Hastings algorithm for Markov chain Monte Carlo (MCMC) sampling to generate extensive ensembles of feasible flux combinations that describe the data within measurement precision limits. To validate our method, we compared myocardial oxygen consumption calculated from the TCA cycle flux with in vivo blood gas measurements for 38 hearts under several experimental conditions, e.g. during coronary artery narrowing. Conclusions Despite the appreciable NMR noise level, the oxygen consumption in the tissue samples, estimated from the NMR spectra, correlates with blood-gas oxygen uptake measurements for the whole heart. The MCMC method provides confidence regions for the estimated metabolic fluxes in single cardiac biopsies, taking the quantified measurement noise level and the nonlinear dependencies between parameters fully into account.
Collapse
Affiliation(s)
- Hannes Hettling
- Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, de Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands.
| | | | | | | | | | | |
Collapse
|
47
|
Schütte J, Bonzanni N, Kinston S, Lelieveld S, Moignard V, Heringa J, Feenstra A, Gottgens B. Reconstructing a core regulatory network model for blood stem/progenitor cells. Exp Hematol 2013. [DOI: 10.1016/j.exphem.2013.05.266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
48
|
Abeln S, Molenaar D, Feenstra KA, Hoefsloot HCJ, Teusink B, Heringa J. Bioinformatics and systems biology: bridging the gap between heterogeneous student backgrounds. Brief Bioinform 2013; 14:589-98. [PMID: 23603092 DOI: 10.1093/bib/bbt023] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Teaching students with very diverse backgrounds can be extremely challenging. This article uses the Bioinformatics and Systems Biology MSc in Amsterdam as a case study to describe how the knowledge gap for students with heterogeneous backgrounds can be bridged. We show that a mix in backgrounds can be turned into an advantage by creating a stimulating learning environment for the students. In the MSc Programme, conversion classes help to bridge differences between students, by mending initial knowledge and skill gaps. Mixing students from different backgrounds in a group to solve a complex task creates an opportunity for the students to reflect on their own abilities. We explain how a truly interdisciplinary approach to teaching helps students of all backgrounds to achieve the MSc end terms. Moreover, transferable skills obtained by the students in such a mixed study environment are invaluable for their later careers.
Collapse
Affiliation(s)
- Sanne Abeln
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands. Tel.: +31 20 59 87816; Fax: +31 20 59 87653;
| | | | | | | | | | | |
Collapse
|
49
|
Pool R, Heringa J, Hoefling M, Schulz R, Smith JC, Feenstra KA. Enabling grand-canonical Monte Carlo: Extending the flexibility of GROMACS through the GromPy python interface module. J Comput Chem 2012; 33:1207-14. [DOI: 10.1002/jcc.22947] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Revised: 12/21/2011] [Accepted: 01/09/2012] [Indexed: 11/06/2022]
|
50
|
Binsl TW, De Graaf AA, Venema K, Heringa J, Maathuis A, De Waard P, Van Beek JHGM. Measuring non-steady-state metabolic fluxes in starch-converting faecal microbiota in vitro. Benef Microbes 2011; 1:391-405. [PMID: 21831778 DOI: 10.3920/bm2010.0038] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
This paper explores human gut bacterial metabolism of starch using a combined analytical and computational modelling approach for metabolite and flux analysis. Non-steady-state isotopic labelling experiments were performed with human faecal microbiota in a well-established in vitro model of the human colon. After culture stabilisation, [U-13C] starch was added and samples were taken at regular intervals. Metabolite concentrations and 13C isotopomeric distributions were measured amongst other things for acetate, propionate and butyrate by mass spectrometry and NMR. The vast majority of metabolic flux analysis methods based on isotopomer analysis published to date are not applicable to metabolic non-steady-state experiments. We therefore developed a new ordinary differential equation-based representation of a metabolic model of human faecal microbiota to determine eleven metabolic parameters that characterised the metabolic flux distribution in the isotope labelling experiment. The feasibility of the model parameter quantification was demonstrated on noisy in silico data using a downhill simplex optimisation, matching simulated labelling patterns of isotopically labelled metabolites with measured metabolite and isotope labelling data. Using the experimental data, we determined an increasing net label influx from starch during the experiment from 94±1 µmol/l/min to 133±3 µmol/l/min. Only about 12% of the total carbon flux from starch reached propionate. Propionate production mainly proceeded via succinate with a small contribution via acrylate. The remaining flux from starch yielded acetate (35%) and butyrate (53%). Interpretation of 13C NMR multiplet signals further revealed that butyrate, valerate and caproate were mainly synthesised via cross-feeding, using acetate as a co-substrate. This study demonstrates for the first time that the experimental design and the analysis of the results by computational modelling allows the determination of time-resolved effects of nutrition on the flux distribution within human faecal microbiota in metabolic non-steady-state.
Collapse
Affiliation(s)
- T W Binsl
- Centre for Integrative Bioinformatics, VU University Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
| | | | | | | | | | | | | |
Collapse
|