1
|
Kumar Das J, Tradigo G, Veltri P, H Guzzi P, Roy S. Data science in unveiling COVID-19 pathogenesis and diagnosis: evolutionary origin to drug repurposing. Brief Bioinform 2021; 22:855-872. [PMID: 33592108 PMCID: PMC7929414 DOI: 10.1093/bib/bbaa420] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 11/09/2020] [Accepted: 12/19/2020] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION The outbreak of novel severe acute respiratory syndrome coronavirus (SARS-CoV-2, also known as COVID-19) in Wuhan has attracted worldwide attention. SARS-CoV-2 causes severe inflammation, which can be fatal. Consequently, there has been a massive and rapid growth in research aimed at throwing light on the mechanisms of infection and the progression of the disease. With regard to this data science is playing a pivotal role in in silico analysis to gain insights into SARS-CoV-2 and the outbreak of COVID-19 in order to forecast, diagnose and come up with a drug to tackle the virus. The availability of large multiomics, radiological, bio-molecular and medical datasets requires the development of novel exploratory and predictive models, or the customisation of existing ones in order to fit the current problem. The high number of approaches generates the need for surveys to guide data scientists and medical practitioners in selecting the right tools to manage their clinical data. RESULTS Focusing on data science methodologies, we conduct a detailed study on the state-of-the-art of works tackling the current pandemic scenario. We consider various current COVID-19 data analytic domains such as phylogenetic analysis, SARS-CoV-2 genome identification, protein structure prediction, host-viral protein interactomics, clinical imaging, epidemiological research and drug discovery. We highlight data types and instances, their generation pipelines and the data science models currently in use. The current study should give a detailed sketch of the road map towards handling COVID-19 like situations by leveraging data science experts in choosing the right tools. We also summarise our review focusing on prime challenges and possible future research directions. CONTACT hguzzi@unicz.it, sroy01@cus.ac.in.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Department of Pediatrics, School of Medicine, Johns Hopkins University, Maryland, USA
| | - Giuseppe Tradigo
- eCampus University, Via Isimbardi 10, 22060 Novedrate, CO, Italy
| | - Pierangelo Veltri
- Department of Surgical and Medical Sciences, Magna Graecia University, Catanzaro, 88100, Italy
| | - Pietro H Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University, Catanzaro, 88100, Italy
| | - Swarup Roy
- Network Reconstruction & Analysis (NetRA) Lab, Department of Computer Applications, Sikkim University, Gangtok, India
| |
Collapse
|
2
|
Abstract
There is accumulating evidence that long noncoding RNAs (lncRNAs) play crucial roles in biological processes and diseases. In recent years, computational models have been widely used to predict potential lncRNA-disease relations. In this chapter, we systematically describe various computational algorithms and prediction tools that have been developed to elucidate the roles of lncRNAs in diseases, coding potential/functional characterization, or ascertaining their involvement in critical biological processes as well as provide a comprehensive summary of these applications.
Collapse
Affiliation(s)
- Fayaz Seifuddin
- Bioinformatics and Computational Biology, National Heart, Lung, and Blood Institute National Institutes of Health, Bethesda, MD, USA
| | - Mehdi Pirooznia
- Bioinformatics and Computational Biology, National Heart, Lung, and Blood Institute National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
3
|
Abstract
The bioengineering tools have significant advantages through less time-consuming and utilized as a promising stage for the production of pharmaceutical bioproducts under the single platform. This review highlighted the advantages and current improvement in the plant, animal and microbial bioengineering tools and outlines feasible approaches by biological and process’s bioengineering levels for advancing the economic feasibility of pharmaceutical’s production. The critical analysis results revealed that system biology and synthetic biology along with advanced bioengineering tools like transcriptome, proteome, metabolome and nano bioengineering tools have shown a promising impact on the development of pharmaceutical’s bioproducts. Tools to overcome and resolve the accompanying encounters of pharmaceutical’s production that include nano bioengineering tools are also discussed. As a summary and prospect, it also gives new insight into the challenges and possible breakthrough of the development of pharmaceutical’s bioproducts through bioengineering tools.
Collapse
Affiliation(s)
- Surendra Sarsaiya
- Key Laboratory of Basic Pharmacology and Joint International Research Laboratory of Ethnomedicine of Ministry of Education, Zunyi Medical University , Zunyi , China.,Bioresource Institute for Healthy Utilization, Zunyi Medical University , Zunyi , China
| | - Jingshan Shi
- Key Laboratory of Basic Pharmacology and Joint International Research Laboratory of Ethnomedicine of Ministry of Education, Zunyi Medical University , Zunyi , China
| | - Jishuang Chen
- Bioresource Institute for Healthy Utilization, Zunyi Medical University , Zunyi , China.,College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University , Nanjing , China
| |
Collapse
|
4
|
Milano M, Milenković T, Cannataro M, Guzzi PH. L-HetNetAligner: A novel algorithm for Local Alignment of Heterogeneous Biological Networks. Sci Rep 2020; 10:3901. [PMID: 32127586 PMCID: PMC7054427 DOI: 10.1038/s41598-020-60737-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 02/11/2020] [Indexed: 11/10/2022] Open
Abstract
Networks are largely used for modelling and analysing a wide range of biological data. As a consequence, many different research efforts have resulted in the introduction of a large number of algorithms for analysis and comparison of networks. Many of these algorithms can deal with networks with a single class of nodes and edges, also referred to as homogeneous networks. Recently, many different approaches tried to integrate into a single model the interplay of different molecules. A possible formalism to model such a scenario comes from node/edge coloured networks (also known as heterogeneous networks) implemented as node/ edge-coloured graphs. Therefore, the need for the introduction of algorithms able to compare heterogeneous networks arises. We here focus on the local comparison of heterogeneous networks, and we formulate it as a network alignment problem. To the best of our knowledge, the local alignment of heterogeneous networks has not been explored in the past. We here propose L-HetNetAligner a novel algorithm that receives as input two heterogeneous networks (node-coloured graphs) and builds a local alignment of them. We also implemented and tested our algorithm. Our results confirm that our method builds high-quality alignments. The following website *contains Supplementary File 1 material and the code.
Collapse
Affiliation(s)
- Marianna Milano
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, 88040, Italy
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, USA
| | - Mario Cannataro
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, 88040, Italy
- Data Analytics Research Center, University of Catanzaro, Catanzaro, Italy
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, 88040, Italy.
- Data Analytics Research Center, University of Catanzaro, Catanzaro, Italy.
| |
Collapse
|
5
|
Gliozzo J, Perlasca P, Mesiti M, Casiraghi E, Vallacchi V, Vergani E, Frasca M, Grossi G, Petrini A, Re M, Paccanaro A, Valentini G. Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction. Sci Rep 2020; 10:3612. [PMID: 32107391 DOI: 10.1038/s41598-020-60235-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 11/05/2019] [Indexed: 12/15/2022] Open
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.
Collapse
|
6
|
Li W, Zhang Y, He Y, Wang Y, Guo S, Zhao X, Feng Y, Song Z, Zou Y, He W, Chen L. Candidate gene prioritization for non-communicable diseases based on functional information: Case studies. J Biomed Inform 2019; 93:103155. [PMID: 30902596 DOI: 10.1016/j.jbi.2019.103155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 03/14/2019] [Accepted: 03/19/2019] [Indexed: 10/27/2022]
Abstract
Candidate gene prioritization for complex non-communicable diseases is essential to understanding the mechanism and developing better means for diagnosing and treating these diseases. Many methods have been developed to prioritize candidate genes in protein-protein interaction (PPI) networks. Integrating functional information/similarity into disease-related PPI networks could improve the performance of prioritization. In this study, a candidate gene prioritization method was proposed for non-communicable diseases considering disease risks transferred between genes in weighted disease PPI networks with weights for nodes and edges based on functional information. Here, three types of non-communicable diseases with pathobiological similarity, Type 2 diabetes (T2D), coronary artery disease (CAD) and dilated cardiomyopathy (DCM), were used as case studies. Literature review and pathway enrichment analysis of top-ranked genes demonstrated the effectiveness of our method. Better performance was achieved after comparing our method with other existing methods. Pathobiological similarity among these three diseases was further investigated for common top-ranked genes to reveal their pathogenesis.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yihua Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Shanshan Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Xilei Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuyan Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Zhaona Song
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuqing Zou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin 150000, Heilongjiang Province, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China.
| |
Collapse
|
7
|
Duran‐Frigola M, Fernández‐Torras A, Bertoni M, Aloy P. Formatting biological big data for modern machine learning in drug discovery. WIREs Comput Mol Sci 2018. [DOI: 10.1002/wcms.1408] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Miquel Duran‐Frigola
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Adrià Fernández‐Torras
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Martino Bertoni
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
| | - Patrick Aloy
- Joint IRB‐BSC‐CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona) Barcelona Institute of Science and Technology Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) Barcelona Spain
| |
Collapse
|
8
|
Navarro C, Martínez V, Blanco A, Cano C. ProphTools: general prioritization tools for heterogeneous biological networks. Gigascience 2018; 6:1-8. [PMID: 29186475 PMCID: PMC5751048 DOI: 10.1093/gigascience/gix111] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 11/09/2017] [Indexed: 12/17/2022] Open
Abstract
Background Networks have been proven effective representations for the analysis of biological data. As such, there exist multiple methods to extract knowledge from biological networks. However, these approaches usually limit their scope to a single biological entity type of interest or they lack the flexibility to analyze user-defined data. Results We developed ProphTools, a flexible open-source command-line tool that performs prioritization on a heterogeneous network. ProphTools prioritization combines a Flow Propagation algorithm similar to a Random Walk with Restarts and a weighted propagation method. A flexible model for the representation of a heterogeneous network allows the user to define a prioritization problem involving an arbitrary number of entity types and their interconnections. Furthermore, ProphTools provides functionality to perform cross-validation tests, allowing users to select the best network configuration for a given problem. ProphTools core prioritization methodology has already been proven effective in gene-disease prioritization and drug repositioning. Here we make ProphTools available to the scientific community as flexible, open-source software and perform a new proof-of-concept case study on long noncoding RNAs (lncRNAs) to disease prioritization. Conclusions ProphTools is robust prioritization software that provides the flexibility not present in other state-of-the-art network analysis approaches, enabling researchers to perform prioritization tasks on any user-defined heterogeneous network. Furthermore, the application to lncRNA-disease prioritization shows that ProphTools can reach the performance levels of ad hoc prioritization tools without losing its generality.
Collapse
Affiliation(s)
- Carmen Navarro
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Victor Martínez
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Armando Blanco
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Carlos Cano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|