1
|
Malec SA, Taneja SB, Albert SM, Elizabeth Shaaban C, Karim HT, Levine AS, Munro P, Callahan TJ, Boyce RD. Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease. J Biomed Inform 2023; 142:104368. [PMID: 37086959 PMCID: PMC10355339 DOI: 10.1016/j.jbi.2023.104368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 03/03/2023] [Accepted: 04/17/2023] [Indexed: 04/24/2023]
Abstract
BACKGROUND Causal feature selection is essential for estimating effects from observational data. Identifying confounders is a crucial step in this process. Traditionally, researchers employ content-matter expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity, conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, without special treatment, erroneous conditioning on variables combining roles introduces bias. However, the vast literature is growing exponentially, making it infeasible to assimilate this knowledge. To address these challenges, we introduce a novel knowledge graph (KG) application enabling causal feature selection by combining computable literature-derived knowledge with biomedical ontologies. We present a use case of our approach specifying a causal model for estimating the total causal effect of depression on the risk of developing Alzheimer's disease (AD) from observational data. METHODS We extracted computable knowledge from a literature corpus using three machine reading systems and inferred missing knowledge using logical closure operations. Using a KG framework, we mapped the output to target terminologies and combined it with ontology-grounded resources. We translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the identified variables. We compared the results with output from a complementary method and published observational studies and examined a selection of confounding and combined role variables in-depth. RESULTS Our search identified 128 confounders, including 58 phenotypes, 47 drugs, 35 genes, 23 collider, and 16 mediator phenotypes. However, only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders, while the remaining 27 phenotypes played other roles. Obstructive sleep apnea emerged as a potential novel confounder for depression and AD. Anemia exemplified a variable playing combined roles. CONCLUSION Our findings suggest combining machine reading and KG could augment human expertise for causal feature selection. However, the complexity of causal feature selection for depression with AD highlights the need for standardized field-specific databases of causal variables. Further work is needed to optimize KG search and transform the output for human consumption.
Collapse
Affiliation(s)
- Scott A Malec
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Steven M Albert
- Department of Behavioral and Community Health Sciences, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - C Elizabeth Shaaban
- Department of Epidemiology, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Helmet T Karim
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Arthur S Levine
- Department of Neurobiology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; The Brain Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Paul Munro
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
2
|
Ontological Representation of Causal Relations for a Deep Understanding of Associations Between Variables in Epidemiology. Artif Intell Med 2022. [DOI: 10.1007/978-3-031-09342-5_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
3
|
Callahan A, Polony V, Posada JD, Banda JM, Gombar S, Shah NH. ACE: the Advanced Cohort Engine for searching longitudinal patient records. J Am Med Inform Assoc 2021; 28:1468-1479. [PMID: 33712854 PMCID: PMC8279796 DOI: 10.1093/jamia/ocab027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 02/23/2021] [Indexed: 01/02/2023] Open
Abstract
OBJECTIVE To propose a paradigm for a scalable time-aware clinical data search, and to describe the design, implementation and use of a search engine realizing this paradigm. MATERIALS AND METHODS The Advanced Cohort Engine (ACE) uses a temporal query language and in-memory datastore of patient objects to provide a fast, scalable, and expressive time-aware search. ACE accepts data in the Observational Medicine Outcomes Partnership Common Data Model, and is configurable to balance performance with compute cost. ACE's temporal query language supports automatic query expansion using clinical knowledge graphs. The ACE API can be used with R, Python, Java, HTTP, and a Web UI. RESULTS ACE offers an expressive query language for complex temporal search across many clinical data types with multiple output options. ACE enables electronic phenotyping and cohort-building with subsecond response times in searching the data of millions of patients for a variety of use cases. DISCUSSION ACE enables fast, time-aware search using a patient object-centric datastore, thereby overcoming many technical and design shortcomings of relational algebra-based querying. Integrating electronic phenotype development with cohort-building enables a variety of high-value uses for a learning health system. Tradeoffs include the need to learn a new query language and the technical setup burden. CONCLUSION ACE is a tool that combines a unique query language for time-aware search of longitudinal patient records with a patient object datastore for rapid electronic phenotyping, cohort extraction, and exploratory data analyses.
Collapse
Affiliation(s)
- Alison Callahan
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - Vladimir Polony
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - José D Posada
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Saurabh Gombar
- Department of Pathology, School of Medicine, Stanford University, Stanford, California, USA
| | - Nigam H Shah
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
4
|
Filice RW, Kahn CE. Integrating an Ontology of Radiology Differential Diagnosis with ICD-10-CM, RadLex, and SNOMED CT. J Digit Imaging 2020; 32:206-210. [PMID: 30706210 DOI: 10.1007/s10278-019-00186-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
An ontology offers a human-readable and machine-computable representation of the concepts in a domain and the relationships among them. Mappings between ontologies enable the reuse and interoperability of biomedical knowledge. We sought to map concepts of the Radiology Gamuts Ontology (RGO), an ontology that links diseases and imaging findings to support differential diagnosis in radiology, to terms in three key vocabularies for clinical radiology: the International Classification of Diseases, version 10, Clinical Modification (ICD-10-CM), the Radiological Society of North America's radiology lexicon (RadLex), and the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). RGO (version 0.7; Jan 2018) incorporated 16,918 terms (classes) for diseases, interventions, and imaging observations linked by 1782 subsumption (class-subclass) relations and 55,569 causal ("may cause") relations. RGO classes were mapped to RadLex (46,656 classes, version 3.15), SNOMED CT (347,358 classes, version 2018AA), and ICD-10-CM (94,645 classes, version 2018AA) using the National Center for Biomedical Ontology (NCBO) Annotator web service. We identified 1275 exact mappings from RGO to RadLex, 5302 to SNOMED CT, and 941 to ICD-10-CM. RGO terms mapped to one ontology (n = 3401), two ontologies (n = 1515), or all three ontologies (n = 198). The mapped ontologies provide additional terms to support data mining from textual information in the electronic health record. The current work builds on efforts to map RGO to ontologies of diseases and phenotypes. Mappings between ontologies can support automated knowledge discovery, diagnostic reasoning, and data mining.
Collapse
Affiliation(s)
- Ross W Filice
- Department of Radiology, MedStar Georgetown University Hospital, Washington, DC, USA
| | - Charles E Kahn
- Department of Radiology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, USA.
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
5
|
Finke MT, Filice RW, Kahn CE. Integrating ontologies of human diseases, phenotypes, and radiological diagnosis. J Am Med Inform Assoc 2019; 26:149-154. [PMID: 30624645 DOI: 10.1093/jamia/ocy161] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/13/2018] [Indexed: 11/12/2022] Open
Abstract
Mappings between ontologies enable reuse and interoperability of biomedical knowledge. The Radiology Gamuts Ontology (RGO)-an ontology of 16 918 diseases, interventions, and imaging observations-provides a resource for differential diagnosis and automated textual report understanding in radiology. An automated process with subsequent manual review was used to identify exact and partial matches of RGO entities to the Disease Ontology (DO) and the Human Phenotype Ontology (HPO). Exact mappings identified equivalent concepts; partial mappings identified subclass and superclass relationships. A total of 7913 distinct RGO entities (46.8%) were mapped to one or both of the two target ontologies. Integration of RGO's causal knowledge resulted in 9605 axioms that expressed direct causal relationships between DO diseases and HPO phenotypic abnormalities, and allowed one to formulate queries about causal relations using the abstraction properties in those two ontologies. The mappings can be used to support automated diagnostic reasoning, data mining, and knowledge discovery.
Collapse
Affiliation(s)
- Michael T Finke
- Pacific Northwest University of Health Sciences, Yakima, WA, USA
| | - Ross W Filice
- Department of Radiology, MedStar Georgetown University Hospital, Washington, DC, USA
| | - Charles E Kahn
- Department of Radiology and Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|