1
|
Quddusi DM, Bajcinca N. Identification of genomic biomarkers and their pathway crosstalks for deciphering mechanistic links in glioblastoma. IET Syst Biol 2023; 17:143-161. [PMID: 37277696 PMCID: PMC10439498 DOI: 10.1049/syb2.12066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 04/22/2023] [Accepted: 05/03/2023] [Indexed: 06/07/2023] Open
Abstract
Glioblastoma is a grade IV pernicious neoplasm occurring in the supratentorial region of brain. As its causes are largely unknown, it is essential to understand its dynamics at the molecular level. This necessitates the identification of better diagnostic and prognostic molecular candidates. Blood-based liquid biopsies are emerging as a novel tool for cancer biomarker discovery, guiding the treatment and improving its early detection based on their tumour origin. There exist previous studies focusing on the identification of tumour-based biomarkers for glioblastoma. However, these biomarkers inadequately represent the underlying pathological state and incompletely illustrate the tumour because of non-recursive nature of this approach to monitor the disease. Also, contrary to the tumour biopsies, liquid biopsies are non-invasive and can be performed at any interval during the disease span to surveil the disease. Therefore, in this study, a unique dataset of blood-based liquid biopsies obtained primarily from tumour-educated blood platelets (TEP) is utilised. This RNA-seq data from ArrayExpress is acquired comprising human cohort with 39 glioblastoma subjects and 43 healthy subjects. Canonical and machine learning approaches are applied for identification of the genomic biomarkers for glioblastoma and their crosstalks. In our study, 97 genes appeared enriched in 7 oncogenic pathways (RAF-MAPK, P53, PRC2-EZH2, YAP conserved, MEK-MAPK, ErbB2 and STK33 signalling pathways) using GSEA, out of which 17 have been identified participating actively in crosstalks. Using PCA, 42 genes are found enriched in 7 pathways (cytoplasmic ribosomal proteins, translation factors, electron transport chain, ribosome, Huntington's disease, primary immunodeficiency pathways, and interferon type I signalling pathway) harbouring tumour when altered, out of which 25 actively participate in crosstalks. All the 14 pathways foster well-known cancer hallmarks and the identified DEGs can serve as genomic biomarkers, not only for the diagnosis and prognosis of Glioblastoma but also in providing a molecular foothold for oncogenic decision making in order to fathom the disease dynamics. Moreover, SNP analysis for the identified DEGs is performed to investigate their roles in disease dynamics in an elaborated manner. These results suggest that TEPs are capable of providing disease insights just like tumour cells with an advantage of being extracted anytime during the course of disease in order to monitor it.
Collapse
Affiliation(s)
- Darrak Moin Quddusi
- Chair of Mechatronics in the Faculty of Mechanical and Process EngineeringRheinland‐Pfälzische Technische Universität Kaiserslautern‐LandauKaiserslauternGermany
| | - Naim Bajcinca
- Chair of Mechatronics in the Faculty of Mechanical and Process EngineeringRheinland‐Pfälzische Technische Universität Kaiserslautern‐LandauKaiserslauternGermany
| |
Collapse
|
2
|
Parastar H, Tauler R. Big (Bio)Chemical Data Mining Using Chemometric Methods: A Need for Chemists. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.201801134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Hadi Parastar
- Department of Chemistry Sharif University of Technology Tehran Iran
| | - Roma Tauler
- Department of Environmental Chemistry IDAEA-CSIC 08034 Barcelona Spain
| |
Collapse
|
3
|
Jain R, Xu W. Dynamic model updating (DMU) approach for statistical learning model building with missing data. BMC Bioinformatics 2021; 22:221. [PMID: 33926384 PMCID: PMC8086098 DOI: 10.1186/s12859-021-04138-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 04/19/2021] [Indexed: 11/17/2022] Open
Abstract
Background Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. Method and results This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. Conclusion DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04138-z.
Collapse
Affiliation(s)
- Rahi Jain
- Biostatistics Department, Princess Margaret Cancer Research Centre, Toronto, ON, Canada
| | - Wei Xu
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
4
|
Garlaschi S, Fochesato A, Tovo A. Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1084. [PMID: 33286853 PMCID: PMC7597173 DOI: 10.3390/e22101084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 09/17/2020] [Accepted: 09/23/2020] [Indexed: 11/16/2022]
Abstract
Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and challenges related to storage and analysis limits. Here, we strengthen an upscaling approach borrowed from theoretical ecology that allows us to infer with small errors relevant patterns of a dataset in its entirety, although only a limited fraction of it has been analysed. In particular we show that, after reducing the input amount of information on the system under study, by applying our framework it is still possible to recover two statistical patterns of interest of the entire dataset. Tested against big ecological, human activity and genomics data, our framework was successful in the reconstruction of global statistics related to both the number of types and their abundances while starting from limited presence/absence information on small random samples of the datasets. These results pave the way for future applications of our procedure in different life science contexts, from social activities to natural ecosystems.
Collapse
Affiliation(s)
- Stefano Garlaschi
- Dipartimento di Fisica e Astronomia “Galileo Galilei”, Università degli studi di Padova, Via Marzolo 8, 35131 Padova, Italy;
| | - Anna Fochesato
- Fondazione The Microsoft Research—University of Trento, Centre for Computational and Systems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto, Italy;
- Dipartimento di Matematica, Università degli studi di Trento, Via Sommarive 14, 38123 Povo, Italy
| | - Anna Tovo
- Dipartimento di Fisica e Astronomia “Galileo Galilei”, Università degli studi di Padova, Via Marzolo 8, 35131 Padova, Italy;
- Dipartimento di Matematica “Tullio Levi-Civita”, Università degli studi di Padova, Via Trieste 63, 35121 Padova, Italy
| |
Collapse
|
5
|
Bartoszewski R, Sikorski AF. Editorial focus: understanding off-target effects as the key to successful RNAi therapy. Cell Mol Biol Lett 2019; 24:69. [PMID: 31867046 PMCID: PMC6902517 DOI: 10.1186/s11658-019-0196-3] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 12/03/2019] [Indexed: 12/21/2022] Open
Abstract
With the first RNA interference (RNAi) drug (ONPATTRO (patisiran)) on the market, we witness the RNAi therapy field reaching a critical turning point, when further improvements in drug candidate design and delivery pipelines should enable fast delivery of novel life changing treatments to patients. Nevertheless, ignoring parallel development of RNAi dedicated in vitro pharmacological profiling aiming to identify undesirable off-target activity may slow down or halt progress in the RNAi field. Since academic research is currently fueling the RNAi development pipeline with new therapeutic options, the objective of this article is to briefly summarize the basics of RNAi therapy, as well as to discuss how to translate basic research into better understanding of related drug candidate safety profiles early in the process.
Collapse
Affiliation(s)
- Rafal Bartoszewski
- Department of Biology and Pharmaceutical Botany, Medical University of Gdansk, Gdansk, Poland
| | - Aleksander F. Sikorski
- Department of Cytobiochemistry, Faculty of Biotechnology, University of Wroclaw, Wroclaw, Poland
| |
Collapse
|
6
|
Singh A, Müller B, Fuxelius HH, Schnürer A. AcetoBase: a functional gene repository and database for formyltetrahydrofolate synthetase sequences. Database (Oxford) 2019; 2019:baz142. [PMID: 31832668 PMCID: PMC6908459 DOI: 10.1093/database/baz142] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 11/01/2019] [Accepted: 11/14/2019] [Indexed: 01/01/2023]
Abstract
Acetogenic bacteria are imperative to environmental carbon cycling and diverse biotechnological applications, but their extensive physiological and taxonomical diversity is an impediment to systematic taxonomic studies. Acetogens are chemolithoautotrophic bacteria that perform reductive carbon fixation under anaerobic conditions through the Wood-Ljungdahl pathway (WLP)/acetyl-coenzyme A pathway. The gene-encoding formyltetrahydrofolate synthetase (FTHFS), a key enzyme of this pathway, is highly conserved and can be used as a molecular marker to probe acetogenic communities. However, there is a lack of systematic collection of FTHFS sequence data at nucleotide and protein levels. In an attempt to streamline investigations on acetogens, we developed AcetoBase - a repository and database for systematically collecting and organizing information related to FTHFS sequences. AcetoBase also provides an opportunity to submit data and obtain accession numbers, perform homology searches for sequence identification and access a customized blast database of submitted sequences. AcetoBase provides the prospect to identify potential acetogenic bacteria, based on metadata information related to genome content and the WLP, supplemented with FTHFS sequence accessions, and can be an important tool in the study of acetogenic communities. AcetoBase can be publicly accessed at https://acetobase.molbio.slu.se.
Collapse
Affiliation(s)
- Abhijeet Singh
- Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala BioCenter, Box 7025, SE-750 07 Uppsala, Sweden
| | - Bettina Müller
- Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala BioCenter, Box 7025, SE-750 07 Uppsala, Sweden
| | - Hans-Henrik Fuxelius
- Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala BioCenter, Box 7025, SE-750 07 Uppsala, Sweden
| | - Anna Schnürer
- Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala BioCenter, Box 7025, SE-750 07 Uppsala, Sweden
| |
Collapse
|
7
|
Alnasir JJ, Shanahan HP. The application of Hadoop in structural bioinformatics. Brief Bioinform 2018; 21:96-105. [PMID: 30462158 DOI: 10.1093/bib/bby106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 09/20/2018] [Accepted: 10/05/2018] [Indexed: 11/13/2022] Open
Abstract
The paper reviews the use of the Hadoop platform in structural bioinformatics applications. For structural bioinformatics, Hadoop provides a new framework to analyse large fractions of the Protein Data Bank that is key for high-throughput studies of, for example, protein-ligand docking, clustering of protein-ligand complexes and structural alignment. Specifically we review in the literature a number of implementations using Hadoop of high-throughput analyses and their scalability. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. Direct comparisons of Hadoop with batch schedulers are absent in the literature but we note there is some evidence that Message Passing Interface implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop, e.g. Spark improve, usage of cloud platforms (e.g. Azure and Amazon Web Services (AWS)) increases and standardised approaches such as Workflow Languages (i.e. Workflow Definition Language, Common Workflow Language and Nextflow) are taken up.
Collapse
Affiliation(s)
- Jamie J Alnasir
- Institute of Cancer Research, Old Brompton Road, London, United Kingdom
| | - Hugh P Shanahan
- Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
| |
Collapse
|
8
|
Abstract
Abstract
Next Generation Sequencing (NGS) or deep sequencing technology enables parallel reading of multiple individual DNA fragments, thereby enabling the identification of millions of base pairs in several hours. Recent research has clearly shown that machine learning technologies can efficiently analyse large sets of genomic data and help to identify novel gene functions and regulation regions. A deep artificial neural network consists of a group of artificial neurons that mimic the properties of living neurons. These mathematical models, termed Artificial Neural Networks (ANN), can be used to solve artificial intelligence engineering problems in several different technological fields (e.g., biology, genomics, proteomics, and metabolomics). In practical terms, neural networks are non-linear statistical structures that are organized as modelling tools and are used to simulate complex genomic relationships between inputs and outputs. To date, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) have been demonstrated to be the best tools for improving performance in problem solving tasks within the genomic field.
Collapse
|
9
|
Tauler R, Parastar H. Big (Bio)Chemical Data Mining Using Chemometric Methods: A Need for Chemists. Angew Chem Int Ed Engl 2018; 61:e201801134. [DOI: 10.1002/anie.201801134] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Indexed: 11/08/2022]
Affiliation(s)
- Roma Tauler
- IDAEA-CSIC Environmental Chemistry Jordi Girona 18-26 08034 Barcelona SPAIN
| | | |
Collapse
|
10
|
Park J, Gabbard JL. Factors that affect scientists' knowledge sharing behavior in health and life sciences research communities: Differences between explicit and implicit knowledge. COMPUTERS IN HUMAN BEHAVIOR 2018. [DOI: 10.1016/j.chb.2017.09.017] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
11
|
Engineered Nucleases and Trinucleotide Repeat Diseases. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016. [DOI: 10.1007/978-1-4939-3509-3_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
12
|
An analytical framework for optimizing variant discovery from personal genomes. Nat Commun 2015; 6:6275. [PMID: 25711446 PMCID: PMC4351570 DOI: 10.1038/ncomms7275] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 01/13/2015] [Indexed: 12/30/2022] Open
Abstract
The standardization and performance testing of analysis tools is a prerequisite to widespread adoption of genome-wide sequencing, particularly in the clinic. However, performance testing is currently complicated by the paucity of standards and comparison metrics, as well as by the heterogeneity in sequencing platforms, applications and protocols. Here we present the genome comparison and analytic testing (GCAT) platform to facilitate development of performance metrics and comparisons of analysis tools across these metrics. Performance is reported through interactive visualizations of benchmark and performance testing data, with support for data slicing and filtering. The platform is freely accessible at http://www.bioplanet.com/gcat. The standardization of clinical sequencing data generation and analysis is of critical importance. Here, the authors develop the Genome Comparison and Analytic Testing platform to facilitate the development of performance metrics and comparisons of analysis tools for clinical sequencing studies.
Collapse
|
13
|
Genomics data curation roles, skills and perception of data quality. LIBRARY & INFORMATION SCIENCE RESEARCH 2015. [DOI: 10.1016/j.lisr.2014.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
14
|
Kosseim P, Dove ES, Baggaley C, Meslin EM, Cate FH, Kaye J, Harris JR, Knoppers BM. Building a data sharing model for global genomic research. Genome Biol 2014; 15:430. [PMID: 25221857 PMCID: PMC4282015 DOI: 10.1186/s13059-014-0430-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Data sharing models designed to facilitate global business provide insights for improving transborder genomic data sharing. We argue that a flexible, externally endorsed, multilateral arrangement, combined with an objective third-party assurance mechanism, can effectively balance privacy with the need to share genomic data globally.
Collapse
Affiliation(s)
- Patricia Kosseim
- />Office of the Privacy Commissioner of Canada, Ottawa, Ontario K1A 1H3 Canada
| | - Edward S Dove
- />Centre of Genomics and Policy, McGill University, Montreal, Quebec H3A 0G1 Canada
| | - Carman Baggaley
- />Office of the Privacy Commissioner of Canada, Ottawa, Ontario K1A 1H3 Canada
| | - Eric M Meslin
- />IU Center for Bioethics, Indiana University, Indianapolis, IN 46202 USA
- />Center for Law, Ethics, and Applied Research in Health Information, Bloomington, IN 47408 USA
| | - Fred H Cate
- />Center for Law, Ethics, and Applied Research in Health Information, Bloomington, IN 47408 USA
- />Maurer School of Law, Indiana University, Bloomington, IN 47405 USA
| | - Jane Kaye
- />HeLEX-Centre for Health, Law and Emerging Technologies, University of Oxford, Old Road Campus, Oxford, OX3 7LF UK
| | - Jennifer R Harris
- />Division of Epidemiology, Department of Genes and Environment, Norwegian Institute of Public Health, PO Box 4404, Nydalen, Oslo 0403 Norway
| | - Bartha M Knoppers
- />Centre of Genomics and Policy, McGill University, Montreal, Quebec H3A 0G1 Canada
| |
Collapse
|