1
|
Tuly SR, Ranjbari S, Murat EA, Arslanturk S. From Silos to Synthesis: A comprehensive review of domain adaptation strategies for multi-source data integration in healthcare. Comput Biol Med 2025; 191:110108. [PMID: 40209575 DOI: 10.1016/j.compbiomed.2025.110108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 03/27/2025] [Accepted: 03/27/2025] [Indexed: 04/12/2025]
Abstract
BACKGROUND The integration of data from diverse sources is not only crucial for addressing data scarcity in health informatics but also enables the use of complementary information from multiple datasets. However, the isolated nature of data collected from disparate sources (referred to as 'Silos') presents significant challenges in multi-source data integration due to inherent heterogeneity and differences in data structures, formats, and standards. Domain adaptation emerges as a key framework to transition from 'Silos' to 'Synthesis' by measuring and mitigating such discrepancies, enabling uniform representation and harmonization of multi-source data. METHODS This study explores different approaches to healthcare data integration, highlighting the challenges associated with each type and discussing both general-purpose and healthcare-specific adaptation methods. We examine key research challenges and evaluate leading domain adaptation approaches, demonstrating their effectiveness and limitations in advancing healthcare data integration. RESULTS The findings highlight the potential of domain adaptation methods to significantly improve healthcare data integration while laying a foundation for future research. CONCLUSION Current research often lacks a comprehensive analysis of how domain adaptation can effectively address the challenges associated with integrating multi-source and multi-modal healthcare datasets. This study serves as a valuable resource for healthcare professionals and researchers, providing guidance on leveraging domain adaptation techniques to mitigate domain discrepancies in healthcare data integration.
Collapse
Affiliation(s)
- Shelia Rahman Tuly
- Department of Computer Science, Wayne State University, 5057 Woodward Ave, Detroit, 48201, MI, USA.
| | - Sima Ranjbari
- Department of Computer Science, Wayne State University, 5057 Woodward Ave, Detroit, 48201, MI, USA.
| | - Ekrem Alper Murat
- Department of Industrial and Systems Engineering, Wayne State University, 4th Street, Detroit, 48201, MI, USA.
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, 5057 Woodward Ave, Detroit, 48201, MI, USA.
| |
Collapse
|
2
|
Nienaber-Rousseau C. Understanding and applying gene-environment interactions: a guide for nutrition professionals with an emphasis on integration in African research settings. Nutr Rev 2025; 83:e443-e463. [PMID: 38442341 PMCID: PMC11723160 DOI: 10.1093/nutrit/nuae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024] Open
Abstract
Noncommunicable diseases (NCDs) are influenced by the interplay between genetics and environmental exposures, particularly diet. However, many healthcare professionals, including nutritionists and dietitians, have limited genetic background and, therefore, they may lack understanding of gene-environment interactions (GxEs) studies. Even researchers deeply involved in nutrition studies, but with a focus elsewhere, can struggle to interpret, evaluate, and conduct GxE studies. There is an urgent need to study African populations that bear a heavy burden of NCDs, demonstrate unique genetic variability, and have cultural practices resulting in distinctive environmental exposures compared with Europeans or Americans, who are studied more. Although diverse and rapidly changing environments, as well as the high genetic variability of Africans and difference in linkage disequilibrium (ie, certain gene variants are inherited together more often than expected by chance), provide unparalleled potential to investigate the omics fields, only a small percentage of studies come from Africa. Furthermore, research evidence lags behind the practices of companies offering genetic testing for personalized medicine and nutrition. We need to generate more evidence on GxEs that also considers continental African populations to be able to prevent unethical practices and enable tailored treatments. This review aims to introduce nutrition professionals to genetics terms and valid methods to investigate GxEs and their challenges, and proposes ways to improve quality and reproducibility. The review also provides insight into the potential contributions of nutrigenetics and nutrigenomics to the healthcare sphere, addresses direct-to-consumer genetic testing, and concludes by offering insights into the field's future, including advanced technologies like artificial intelligence and machine learning.
Collapse
Affiliation(s)
- Cornelie Nienaber-Rousseau
- Centre of Excellence for Nutrition, North-West University, Potchefstroom, South Africa
- SAMRC Extramural Unit for Hypertension and Cardiovascular Disease, Faculty of Health Sciences, North-West University, Potchefstroom, South Africa
| |
Collapse
|
3
|
Oh W, Jung J, Joo JWJ. MR-GGI: accurate inference of gene-gene interactions using Mendelian randomization. BMC Bioinformatics 2024; 25:192. [PMID: 38750431 PMCID: PMC11094870 DOI: 10.1186/s12859-024-05808-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 05/09/2024] [Indexed: 05/19/2024] Open
Abstract
BACKGROUND Researchers have long studied the regulatory processes of genes to uncover their functions. Gene regulatory network analysis is one of the popular approaches for understanding these processes, requiring accurate identification of interactions among the genes to establish the gene regulatory network. Advances in genome-wide association studies and expression quantitative trait loci studies have led to a wealth of genomic data, facilitating more accurate inference of gene-gene interactions. However, unknown confounding factors may influence these interactions, making their interpretation complicated. Mendelian randomization (MR) has emerged as a valuable tool for causal inference in genetics, addressing confounding effects by estimating causal relationships using instrumental variables. In this paper, we propose a new statistical method, MR-GGI, for accurately inferring gene-gene interactions using Mendelian randomization. RESULTS MR-GGI applies one gene as the exposure and another as the outcome, using causal cis-single-nucleotide polymorphisms as instrumental variables in the inverse-variance weighted MR model. Through simulations, we have demonstrated MR-GGI's ability to control type 1 error and maintain statistical power despite confounding effects. MR-GGI performed the best when compared to other methods using the F1 score on the DREAM5 dataset. Additionally, when applied to yeast genomic data, MR-GGI successfully identified six clusters. Through gene ontology analysis, we have confirmed that each cluster in our study performs distinct functional roles by gathering genes with specific functions. CONCLUSION These findings demonstrate that MR-GGI accurately inferences gene-gene interactions despite the confounding effects in real biological environments.
Collapse
Affiliation(s)
- Wonseok Oh
- Department of Industrial Pharmacy, Dongguk University-Seoul, Seoul, 04620, South Korea
| | - Junghyun Jung
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Hollywood, CA, USA
| | - Jong Wha J Joo
- Department of Computer Science and Engineering, Dongguk University-Seoul, Seoul, 04620, South Korea.
- Division of AI Software Convergence, Dongguk University-Seoul, Seoul, 04620, South Korea.
| |
Collapse
|
4
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. Front Genet 2023; 14:1286800. [PMID: 38125750 PMCID: PMC10731261 DOI: 10.3389/fgene.2023.1286800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/14/2023] [Indexed: 12/23/2023] Open
Abstract
Introduction: Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Methods: Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. Results: We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. The clustering derived from netMUG achieved an adjusted Rand index of 1 with respect to the synthesized true labels. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these subgroups. Discussion: netMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Federico Melograna
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Hanne Hoskens
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Susan Walsh
- Department of Biology, Indiana University Indianapolis, Indianapolis, IN, United States
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA, United States
| | | | - Peter Claes
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, VIC, Australia
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
5
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539350. [PMID: 37205363 PMCID: PMC10187283 DOI: 10.1101/2023.05.04.539350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these classes. NetMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | | | - Hanne Hoskens
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Susan Walsh
- Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA 15219, USA
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA 16801, USA
| | | | - Peter Claes
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
| | - Kristel Van Steen
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|