1
|
Grentner A, Ragueneau E, Gong C, Prinz A, Gansberger S, Oyarzun I, Hermjakob H, Griss J. ReactomeGSA: new features to simplify public data reuse. Bioinformatics 2024; 40:btae338. [PMID: 38806182 PMCID: PMC11147800 DOI: 10.1093/bioinformatics/btae338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/20/2024] [Accepted: 05/26/2024] [Indexed: 05/30/2024] Open
Abstract
MOTIVATION ReactomeGSA is part of the Reactome knowledgebase and one of the leading multi-omics pathway analysis platforms. ReactomeGSA provides access to quantitative pathway analysis methods supporting different 'omics data types. Additionally, ReactomeGSA can process different datasets simultaneously, leading to a comparative pathway analysis that can also be performed across different species. RESULTS We present a major update to the ReactomeGSA analysis platforms that greatly simplifies the reuse and direct integration of public data. In order to increase the number of available datasets, we developed the new grein_loader Python application that can directly fetch experiments from the GREIN resource. This enabled us to support both EMBL-EBI's Expression Atlas and GEO RNA-seq Experiments Interactive Navigator within ReactomeGSA. To further increase the visibility and simplify the reuse of public datasets, we integrated a novel search function into ReactomeGSA that enables users to search for public datasets across both supported resources. Finally, we completely re-developed ReactomeGSA's web-frontend and R/Bioconductor package to support the new search and loading features, and greatly simplify the use of ReactomeGSA. AVAILABILITY AND IMPLEMENTATION The new ReactomeGSA web frontend is available at https://www.reactome.org/gsa with an built-in, interactive tutorial. The ReactomeGSA R package (https://bioconductor.org/packages/release/bioc/html/ReactomeGSA.html) is available through Bioconductor and shipped with detailed documentation and vignettes. The grein_loader Python application is available through the Python Package Index (pypi). The complete source code for all applications is available on GitHub at https://github.com/grisslab/grein_loader and https://github.com/reactome.
Collapse
Affiliation(s)
- Alexander Grentner
- Department of Dermatology, Medical University of Vienna, Vienna 1090, Austria
| | - Eliot Ragueneau
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Chuqiao Gong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Adrian Prinz
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sabina Gansberger
- Department of Dermatology, Medical University of Vienna, Vienna 1090, Austria
| | - Inigo Oyarzun
- Department of Dermatology, Medical University of Vienna, Vienna 1090, Austria
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Johannes Griss
- Department of Dermatology, Medical University of Vienna, Vienna 1090, Austria
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
2
|
Williams A. Multiomics data integration, limitations, and prospects to reveal the metabolic activity of the coral holobiont. FEMS Microbiol Ecol 2024; 100:fiae058. [PMID: 38653719 PMCID: PMC11067971 DOI: 10.1093/femsec/fiae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 03/25/2024] [Accepted: 04/22/2024] [Indexed: 04/25/2024] Open
Abstract
Since their radiation in the Middle Triassic period ∼240 million years ago, stony corals have survived past climate fluctuations and five mass extinctions. Their long-term survival underscores the inherent resilience of corals, particularly when considering the nutrient-poor marine environments in which they have thrived. However, coral bleaching has emerged as a global threat to coral survival, requiring rapid advancements in coral research to understand holobiont stress responses and allow for interventions before extensive bleaching occurs. This review encompasses the potential, as well as the limits, of multiomics data applications when applied to the coral holobiont. Synopses for how different omics tools have been applied to date and their current restrictions are discussed, in addition to ways these restrictions may be overcome, such as recruiting new technology to studies, utilizing novel bioinformatics approaches, and generally integrating omics data. Lastly, this review presents considerations for the design of holobiont multiomics studies to support lab-to-field advancements of coral stress marker monitoring systems. Although much of the bleaching mechanism has eluded investigation to date, multiomic studies have already produced key findings regarding the holobiont's stress response, and have the potential to advance the field further.
Collapse
Affiliation(s)
- Amanda Williams
- Microbial Biology Graduate Program, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
| |
Collapse
|
3
|
Tian Z, Jia J, Yin B, Chen W. Constructing the metabolic network of wheat kernels based on structure-guided chemical modification and multi-omics data. J Genet Genomics 2024:S1673-8527(24)00037-7. [PMID: 38458562 DOI: 10.1016/j.jgg.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/27/2024] [Accepted: 02/27/2024] [Indexed: 03/10/2024]
Abstract
Metabolic network construction plays a pivotal role in unraveling the regulatory mechanism of biological activities, although it often proves to be challenging and labor-intensive, particularly with non-model organisms. In this study, we develop a computational approach that employs reaction models based on structure-guided chemical modification and related compounds to construct a metabolic network in wheat. This construction results in a comprehensive structure-guided network, including 625 identified metabolites and additional 333 putative reactions compared with the Kyoto Encyclopedia of Genes and Genomes database. Using a combination of gene annotation, reaction classification, structure similarity, and transcriptome and metabolome analysis correlations, a total of 229 potential genes related to these reactions are identified within this network. To validate the network, the functionality of a hydroxycinnamoyltransferase (TraesCS3D01G314900) for the synthesis of polyphenols and a rhamnosyltransferase (TraesCS2D01G078700) for the modification of flavonoids are verified through in vitro enzymatic studies and wheat mutant tests, respectively. Our research thus supports the utility of structure-guided chemical modification as an effective tool in identifying causal candidate genes for constructing metabolic networks and further in metabolomic genetic studies.
Collapse
Affiliation(s)
- Zhitao Tian
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Jingqi Jia
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Bo Yin
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Wei Chen
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China.
| |
Collapse
|
4
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RP, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574780. [PMID: 38260498 PMCID: PMC10802464 DOI: 10.1101/2024.01.09.574780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. The PathIntegrate Python package is available at https://github.com/cwieder/PathIntegrate.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Rachel Pj Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
5
|
Young T, Laroche O, Walker SP, Miller MR, Casanovas P, Steiner K, Esmaeili N, Zhao R, Bowman JP, Wilson R, Bridle A, Carter CG, Nowak BF, Alfaro AC, Symonds JE. Prediction of Feed Efficiency and Performance-Based Traits in Fish via Integration of Multiple Omics and Clinical Covariates. BIOLOGY 2023; 12:1135. [PMID: 37627019 PMCID: PMC10452023 DOI: 10.3390/biology12081135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023]
Abstract
Fish aquaculture is a rapidly expanding global industry, set to support growing demands for sources of marine protein. Enhancing feed efficiency (FE) in farmed fish is required to reduce production costs and improve sector sustainability. Recognising that organisms are complex systems whose emerging phenotypes are the product of multiple interacting molecular processes, systems-based approaches are expected to deliver new biological insights into FE and growth performance. Here, we establish 14 diverse layers of multi-omics and clinical covariates to assess their capacities to predict FE and associated performance traits in a fish model (Oncorhynchus tshawytscha) and uncover the influential variables. Inter-omic relatedness between the different layers revealed several significant concordances, particularly between datasets originating from similar material/tissue and between blood indicators and some of the proteomic (liver), metabolomic (liver), and microbiomic layers. Single- and multi-layer random forest (RF) regression models showed that integration of all data layers provide greater FE prediction power than any single-layer model alone. Although FE was among the most challenging of the traits we attempted to predict, the mean accuracy of 40 different FE models in terms of root-mean square errors normalized to percentage was 30.4%, supporting RF as a feature selection tool and approach for complex trait prediction. Major contributions to the integrated FE models were derived from layers of proteomic and metabolomic data, with substantial influence also provided by the lipid composition layer. A correlation matrix of the top 27 variables in the models highlighted FE trait-associations with faecal bacteria (Serratia spp.), palmitic and nervonic acid moieties in whole body lipids, levels of free glycerol in muscle, and N-acetylglutamic acid content in liver. In summary, we identified subsets of molecular characteristics for the assessment of commercially relevant performance-based metrics in farmed Chinook salmon.
Collapse
Affiliation(s)
- Tim Young
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
- The Centre for Biomedical and Chemical Sciences, School of Science, Auckland University of Technology, Private Bag 92006, Auckland 1142, New Zealand
| | | | | | - Matthew R. Miller
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | | | | | - Noah Esmaeili
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Ruixiang Zhao
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - John P. Bowman
- Tasmanian Institute of Agricultural Research, University of Tasmania, Hobart 7005, Australia
| | - Richard Wilson
- Central Science Laboratory, Research Division, University of Tasmania, Hobart 7001, Australia
| | - Andrew Bridle
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Chris G. Carter
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
- Blue Economy Cooperative Research Centre, Launceston 7250, Australia
| | - Barbara F. Nowak
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Andrea C. Alfaro
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
| | - Jane E. Symonds
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| |
Collapse
|
6
|
Wekesa JS, Kimwele M. A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment. Front Genet 2023; 14:1199087. [PMID: 37547471 PMCID: PMC10398577 DOI: 10.3389/fgene.2023.1199087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 07/11/2023] [Indexed: 08/08/2023] Open
Abstract
Accurate diagnosis is the key to providing prompt and explicit treatment and disease management. The recognized biological method for the molecular diagnosis of infectious pathogens is polymerase chain reaction (PCR). Recently, deep learning approaches are playing a vital role in accurately identifying disease-related genes for diagnosis, prognosis, and treatment. The models reduce the time and cost used by wet-lab experimental procedures. Consequently, sophisticated computational approaches have been developed to facilitate the detection of cancer, a leading cause of death globally, and other complex diseases. In this review, we systematically evaluate the recent trends in multi-omics data analysis based on deep learning techniques and their application in disease prediction. We highlight the current challenges in the field and discuss how advances in deep learning methods and their optimization for application is vital in overcoming them. Ultimately, this review promotes the development of novel deep-learning methodologies for data integration, which is essential for disease detection and treatment.
Collapse
|
7
|
Zhang L, Zou J, Wang Z, Li L. A Subpathway and Target Gene Cluster-Based Approach Uncovers lncRNAs Associated with Human Primordial Follicle Activation. Int J Mol Sci 2023; 24:10525. [PMID: 37445702 DOI: 10.3390/ijms241310525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 06/13/2023] [Accepted: 06/20/2023] [Indexed: 07/15/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are emerging as a critical regulator in controlling the expression level of genes involved in cell differentiation and development. Primordial follicle activation (PFA) is the first step for follicle maturation, and excessive PFA results in premature ovarian insufficiency (POI). However, the correlation between lncRNA and cell differentiation was largely unknown, especially during PFA. In this study, we observed the expression level of lncRNA was more specific than protein-coding genes in both follicles and granulosa cells, suggesting lncRNA might play a crucial role in follicle development. Hence, a systematical framework was needed to infer the functions of lncRNAs during PFA. Additionally, an increasing number of studies indicate that the subpathway is more precise in reflecting biological processes than the entire pathway. Given the complex expression patterns of lncRNA target genes, target genes were further clustered based on their expression similarity and classification performance to reveal the activated/inhibited gene modules, which intuitively illustrated the diversity of lncRNA regulation. Moreover, the knockdown of SBF2-AS1 in the A549 cell line and ZFAS1 in the SK-Hep1 cell line further validated the function of SBF2-AS1 in regulating the Hippo signaling subpathway and ZFAS1 in the cell cycle subpathway. Overall, our findings demonstrated the importance of subpathway analysis in uncovering the functions of lncRNAs during PFA, and paved new avenues for future lncRNA-associated research.
Collapse
Affiliation(s)
- Li Zhang
- Guangdong Provincial Key Laboratory of Proteomics, Department of Pathophysiology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Jiyuan Zou
- Guangdong Provincial Key Laboratory of Proteomics, Department of Pathophysiology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zhihao Wang
- Guangdong Provincial Key Laboratory of Proteomics, Department of Pathophysiology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Lin Li
- Guangdong Provincial Key Laboratory of Proteomics, Department of Pathophysiology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
8
|
Liang Z, Zheng X, Wang Y, Chu K, Gao Y. Using system biology and bioinformatics to identify the influences of COVID-19 co-infection with influenza virus on COPD. Funct Integr Genomics 2023; 23:175. [PMID: 37221323 DOI: 10.1007/s10142-023-01091-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 05/07/2023] [Accepted: 05/09/2023] [Indexed: 05/25/2023]
Abstract
Coronavirus disease 2019 (COVID-19) has speedily increased mortality globally. Although they are risk factors for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), less is known about the common molecular mechanisms behind COVID-19, influenza virus A (IAV), and chronic obstructive pulmonary disease (COPD). This research used bioinformatics and systems biology to find possible medications for treating COVID-19, IAV, and COPD via identifying differentially expressed genes (DEGs) from gene expression datasets (GSE171110, GSE76925, GSE106986, and GSE185576). A total of 78 DEGs were subjected to functional enrichment, pathway analysis, protein-protein interaction (PPI) network construct, hub gene extraction, and other potentially relevant disorders. Then, DEGs were discovered in networks including transcription factor (TF)-gene connections, protein-drug interactions, and DEG-microRNA (miRNA) coregulatory networks by using NetworkAnalyst. The top 12 hub genes were MPO, MMP9, CD8A, HP, ELANE, CD5, CR2, PLA2G7, PIK3R1, SLAMF1, PEX3, and TNFRSF17. We found that 44 TFs-genes, as well as 118 miRNAs, are directly linked to hub genes. Additionally, we searched the Drug Signatures Database (DSigDB) and identified 10 drugs that could potentially treat COVID-19, IAV, and COPD. Therefore, we evaluated the top 12 hub genes that could be promising DEGs for targeted therapy for SARS-CoV-2 and identified several prospective medications that may benefit COPD patients with COVID-19 and IAV co-infection.
Collapse
Affiliation(s)
- Zihao Liang
- Clinical Research Center, the Second Hospital of Nanjing, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Xudong Zheng
- Department of Immunology, School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Yuan Wang
- Clinical Research Center, the Second Hospital of Nanjing, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Kai Chu
- Department of Vaccine Clinical Evaluation, Jiangsu Provincial Center for Disease Control and Prevention, Nanjing, 210009, China.
| | - Yanan Gao
- Department of Immunology, School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing, 210023, China.
| |
Collapse
|
9
|
Lee K, Hyung D, Cho SY, Yu N, Hong S, Kim J, Kim S, Han JY, Park C. Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning. Comput Struct Biotechnol J 2023; 21:1978-1988. [PMID: 36942103 PMCID: PMC10023904 DOI: 10.1016/j.csbj.2023.02.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/28/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023] Open
Abstract
Alternative splicing (AS) events modulate certain pathways and phenotypic plasticity in cancer. Although previous studies have computationally analyzed splicing events, it is still a challenge to uncover biological functions induced by reliable AS events from tremendous candidates. To provide essential splicing event signatures to assess pathway regulation, we developed a database by collecting two datasets: (i) reported literature and (ii) cancer transcriptome profile. The former includes knowledge-based splicing signatures collected from 63,229 PubMed abstracts using natural language processing, extracted for 202 pathways. The latter is the machine learning-based splicing signatures identified from pan-cancer transcriptome for 16 cancer types and 42 pathways. We established six different learning models to classify pathway activities from splicing profiles as a learning dataset. Top-ranked AS events by learning model feature importance became the signature for each pathway. To validate our learning results, we performed evaluations by (i) performance metrics, (ii) differential AS sets acquired from external datasets, and (iii) our knowledge-based signatures. The area under the receiver operating characteristic values of the learning models did not exhibit any drastic difference. However, random-forest distinctly presented the best performance to compare with the AS sets identified from external datasets and our knowledge-based signatures. Therefore, we used the signatures obtained from the random-forest model. Our database provided the clinical characteristics of the AS signatures, including survival test, molecular subtype, and tumor microenvironment. The regulation by splicing factors was additionally investigated. Our database for developed signatures supported retrieval and visualization system.
Collapse
Key Words
- AS, Alternative splicing
- AUCPR, the area under the precision-recall curve
- AUROC, the area under the receiver operating characteristic
- Alternative splicing
- DAS, differential alternative splicing
- Database
- EMT, epithelial mesenchymal transition
- Gene signature
- ML, machine learning
- Machine-learning
- NER, named entity recognition
- NLP, natural language process
- PCA, principal component analysis
- PSI, percent spliced in index
- RF, random-forest
- SF, splicing factor
- TCGA, The Cancer Genome Atlas
- Text-mining
- Tumor transcriptome
Collapse
Affiliation(s)
- Kyubin Lee
- Research Institute, National Cancer Center, 232 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Daejin Hyung
- Research Institute, National Cancer Center, 232 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Soo Young Cho
- Department of Molecular & Life Science, Hanyang University, 55 Hanyangdaehak-ro, Sangnok-gu, Ansan-si, Gyeonggi-do 15588, Republic of Korea
| | - Namhee Yu
- Research Institute, National Cancer Center, 232 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Sewha Hong
- Research Institute, National Cancer Center, 232 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Jihyun Kim
- Research Institute, National Cancer Center, 232 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
- Department of Precision Medicine, National Institute of Health, Korea Disease Control and Prevention Agency, Osong Health Technology Administration Complex, 187, Osongsaengmyeong 2-ro, Osong-eup, Heungdeok-gu, Cheongju-si, Chungcheongbuk-do 28159, Republic of Korea
| | - Sunshin Kim
- Research Institute, National Cancer Center, 232 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Ji-Youn Han
- Research Institute, National Cancer Center, 232 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
| | - Charny Park
- Research Institute, National Cancer Center, 232 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea
- Correspondence to: 323 Ilsan-ro, Ilsandonggu, Goyang-si, Gyeonggi-do 10408, Republic of Korea.
| |
Collapse
|
10
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
|