1
|
Hou L, Geng Z, Yuan Z, Shi X, Wang C, Chen F, Li H, Xue F. MRSL: a causal network pruning algorithm based on GWAS summary data. Brief Bioinform 2024; 25:bbae086. [PMID: 38487847 PMCID: PMC10940843 DOI: 10.1093/bib/bbae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 02/01/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Causal discovery is a powerful tool to disclose underlying structures by analyzing purely observational data. Genetic variants can provide useful complementary information for structure learning. Recently, Mendelian randomization (MR) studies have provided abundant marginal causal relationships of traits. Here, we propose a causal network pruning algorithm MRSL (MR-based structure learning algorithm) based on these marginal causal relationships. MRSL combines the graph theory with multivariable MR to learn the conditional causal structure using only genome-wide association analyses (GWAS) summary statistics. Specifically, MRSL utilizes topological sorting to improve the precision of structure learning. It proposes MR-separation instead of d-separation and three candidates of sufficient separating set for MR-separation. The results of simulations revealed that MRSL had up to 2-fold higher F1 score and 100 times faster computing time than other eight competitive methods. Furthermore, we applied MRSL to 26 biomarkers and 44 International Classification of Diseases 10 (ICD10)-defined diseases using GWAS summary data from UK Biobank. The results cover most of the expected causal links that have biological interpretations and several new links supported by clinical case reports or previous observational literatures.
Collapse
Affiliation(s)
- Lei Hou
- Beijing International Center for Mathematical Research, Peking University, Beijing, People’s Republic of China, 100871
| | - Zhi Geng
- School of Mathematics and Statistics, Beijing Technology and Business University, Beijing, People’s Republic of China, 100048
| | - Zhongshang Yuan
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
| | - Xu Shi
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
| | - Chuan Wang
- Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| | - Feng Chen
- School of Public Health, Nanjing Medical University, Nanjing, China, 211166
| | - Hongkai Li
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
| | - Fuzhong Xue
- Department of Epidemiology and Health Statistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China, 250000
- Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, People's Republic of China, 250000
| |
Collapse
|
2
|
Robson B, Boray S, Weisman J. Mining real-world high dimensional structured data in medicine and its use in decision support. Some different perspectives on unknowns, interdependency, and distinguishability. Comput Biol Med 2021; 141:105118. [PMID: 34971979 DOI: 10.1016/j.compbiomed.2021.105118] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/18/2021] [Accepted: 12/02/2021] [Indexed: 11/03/2022]
Abstract
There are many difficulties in extracting and using knowledge for medical analytic and predictive purposes from Real-World Data, even when the data is already well structured in the manner of a large spreadsheet. Preparative curation and standardization or "normalization" of such data involves a variety of chores but underlying them is an interrelated set of fundamental problems that can in part be dealt with automatically during the datamining and inference processes. These fundamental problems are reviewed here and illustrated and investigated with examples. They concern the treatment of unknowns, the need to avoid independency assumptions, and the appearance of entries that may not be fully distinguished from each other. Unknowns include errors detected as implausible (e.g., out of range) values that are subsequently converted to unknowns. These problems are further impacted by high dimensionality and problems of sparse data that inevitably arise from high-dimensional datamining even if the data is extensive. All these considerations are different aspects of incomplete information, though they also relate to problems that arise if care is not taken to avoid or ameliorate consequences of including the same information twice or more, or if misleading or inconsistent information is combined. This paper addresses these aspects from a slightly different perspective using the Q-UEL language and inference methods based on it by borrowing some ideas from the mathematics of quantum mechanics and information theory. It takes the view that detection and correction of probabilistic elements of knowledge subsequently used in inference need only involve testing and correction so that they satisfy certain extended notions of coherence between probabilities. This is by no means the only possible view, and it is explored here and later compared with a related notion of consistency.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc, Ohio, USA; The Dirac Foundation, Oxfordshire, UK.
| | | | - J Weisman
- The Dirac Foundation, Oxfordshire, UK.
| |
Collapse
|
3
|
Robson B. Bidirectional General Graphs for inference. Principles and implications for medicine. Comput Biol Med 2019; 108:382-399. [PMID: 31075569 DOI: 10.1016/j.compbiomed.2019.04.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 04/03/2019] [Accepted: 04/04/2019] [Indexed: 12/17/2022]
Abstract
Probabilistic inference methods require a more general and realistic description of the world as a Bidirectional General Graph (BGG). While in its original form the Bayes Net (BN) has been promoted as a predictive tool, it is more immediately a way of testing a hypothesis or model about interactions in a system usually considered on a causal basis. Once established, the model can be used in a predictive way, but the problem here is that for a traditional BN the hypotheses or models that can be formed are limited to the Directed Acyclic Graph (DAG) by definition. Three interrelated features are highlighted that represent deficiencies of the DAG which are corrected by conversion to a method based on a BGG: (i) lack of intrinsic representation of coherence by Bayes' rule, (ii) relatedly the need to consider interdependence in parent nodes, and (iii) the need for management of a property called recurrence. These deficiencies can represent large errors in absolute estimates of probabilities, and while relative and renormalized probabilities ameliorate that, they can often make much of a net superfluous through cancelations by division. The Hyperbolic Dirac Net (HDN) based on Dirac's quantum mechanics is a solution that led naturally to avoiding these deficiencies. It encodes bidirectional probabilities in an h-complex value rediscovered by Dirac, i.e. with the imaginary number h such that hh = +1. Properties of the HDN described previously are reviewed (though emphasis is on descriptions in familiar probability terms), the issue of recurrence is introduced, methods of construction are simplified, and the severity of the quantitative differences between BNs and analogous HDNs are exemplified. There is also discussion of how results compare with other approaches in practice.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. Viginia, USA; The Dirac Foundation, OxfordShire, UK.
| |
Collapse
|
4
|
Pandey TN, Jagadev AK, Dehuri S, Cho SB. A review and empirical analysis of neural networks based exchange rate prediction. INTELLIGENT DECISION TECHNOLOGIES 2019. [DOI: 10.3233/idt-180346] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Trilok Nath Pandey
- Department of Computer Science and Engineering, S’O’A Deemed to be University, Bhubaneswar, Odisha, India
| | - Alok Kumar Jagadev
- School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India
| | - Satchidananda Dehuri
- Department of Information and Communication, Fakir Mohan University, Balasore, Odisha, India
| | - Sung-Bae Cho
- Department of Computer Science, Yonsei University, Seoul, Korea
| |
Collapse
|
5
|
Pittoli F, Vianna HD, Victória Barbosa JL, Butzen E, Gaedke MÂ, Dias da Costa JS, Scherer dos Santos RB. An intelligent system for prognosis of noncommunicable diseases’ risk factors. TELEMATICS AND INFORMATICS 2018. [DOI: 10.1016/j.tele.2018.02.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|