1
|
Prakash SJ, Van Auken KM, Hill DP, Sternberg PW. Semantic representation of neural circuit knowledge in Caenorhabditis elegans. Brain Inform 2023; 10:30. [PMID: 37947958 PMCID: PMC10638142 DOI: 10.1186/s40708-023-00208-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/22/2023] [Indexed: 11/12/2023] Open
Abstract
In modern biology, new knowledge is generated quickly, making it challenging for researchers to efficiently acquire and synthesise new information from the large volume of primary publications. To address this problem, computational approaches that generate machine-readable representations of scientific findings in the form of knowledge graphs have been developed. These representations can integrate different types of experimental data from multiple papers and biological knowledge bases in a unifying data model, providing a complementary method to manual review for interacting with published knowledge. The Gene Ontology Consortium (GOC) has created a semantic modelling framework that extends individual functional gene annotations to structured descriptions of causal networks representing biological processes (Gene Ontology-Causal Activity Modelling, or GO-CAM). In this study, we explored whether the GO-CAM framework could represent knowledge of the causal relationships between environmental inputs, neural circuits and behavior in the model nematode C. elegans [C. elegans Neural-Circuit Causal Activity Modelling (CeN-CAM)]. We found that, given extensions to several relevant ontologies, a wide variety of author statements from the literature about the neural circuit basis of egg-laying and carbon dioxide (CO2) avoidance behaviors could be faithfully represented with CeN-CAM. Through this process, we were able to generate generic data models for several categories of experimental results. We also discuss how semantic modelling may be used to functionally annotate the C. elegans connectome. Thus, Gene Ontology-based semantic modelling has the potential to support various machine-readable representations of neurobiological knowledge.
Collapse
Affiliation(s)
- Sharan J Prakash
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Kimberly M Van Auken
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - David P Hill
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
| |
Collapse
|
2
|
Prakash SJ, Van Auken KM, Hill DP, Sternberg PW. Semantic Representation of Neural Circuit Knowledge in Caenorhabditis elegans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.28.538760. [PMID: 37162850 PMCID: PMC10168330 DOI: 10.1101/2023.04.28.538760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
In modern biology, new knowledge is generated quickly, making it challenging for researchers to efficiently acquire and synthesise new information from the large volume of primary publications. To address this problem, computational approaches that generate machine-readable representations of scientific findings in the form of knowledge graphs have been developed. These representations can integrate different types of experimental data from multiple papers and biological knowledge bases in a unifying data model, providing a complementary method to manual review for interacting with published knowledge. The Gene Ontology Consortium (GOC) has created a semantic modelling framework that extends individual functional gene annotations to structured descriptions of causal networks representing biological processes (Gene Ontology Causal Activity Modelling, or GO-CAM). In this study, we explored whether the GO-CAM framework could represent knowledge of the causal relationships between environmental inputs, neural circuits and behavior in the model nematode C. elegans (C. elegans Neural Circuit Causal Activity Modelling (CeN-CAM)). We found that, given extensions to several relevant ontologies, a wide variety of author statements from the literature about the neural circuit basis of egg-laying and carbon dioxide (CO2) avoidance behaviors could be faithfully represented with CeN-CAM. Through this process, we were able to generate generic data models for several categories of experimental results. We also discuss how semantic modelling may be used to functionally annotate the C. elegans connectome. Thus, Gene Ontology-based semantic modelling has the potential to support various machine-readable representations of neurobiological knowledge.
Collapse
Affiliation(s)
- Sharan J Prakash
- 1. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Kimberly M Van Auken
- 1. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - David P Hill
- 2. The Jackson Laboratory, Bar Harbor, ME, 04609 USA
| | - Paul W Sternberg
- 1. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
3
|
He Z, Pfaff E, Guo SJ, Guo Y, Wu Y, Tao C, Stiglic G, Bian J. Enriching Real-world Data with Social Determinants of Health for Health Outcomes and Health Equity: Successes, Challenges, and Opportunities. Yearb Med Inform 2023; 32:253-263. [PMID: 38147867 PMCID: PMC10751148 DOI: 10.1055/s-0043-1768732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open
Abstract
OBJECTIVE To summarize the recent methods and applications that leverage real-world data such as electronic health records (EHRs) with social determinants of health (SDoH) for public and population health and health equity and identify successes, challenges, and possible solutions. METHODS In this opinion review, grounded on a social-ecological-model-based conceptual framework, we surveyed data sources and recent informatics approaches that enable leveraging SDoH along with real-world data to support public health and clinical health applications including helping design public health intervention, enhancing risk stratification, and enabling the prediction of unmet social needs. RESULTS Besides summarizing data sources, we identified gaps in capturing SDoH data in existing EHR systems and opportunities to leverage informatics approaches to collect SDoH information either from structured and unstructured EHR data or through linking with public surveys and environmental data. We also surveyed recently developed ontologies for standardizing SDoH information and approaches that incorporate SDoH for disease risk stratification, public health crisis prediction, and development of tailored interventions. CONCLUSIONS To enable effective public health and clinical applications using real-world data with SDoH, it is necessary to develop both non-technical solutions involving incentives, policies, and training as well as technical solutions such as novel social risk management tools that are integrated into clinical workflow. Ultimately, SDoH-powered social risk management, disease risk prediction, and development of SDoH tailored interventions for disease prevention and management have the potential to improve population health, reduce disparities, and improve health equity.
Collapse
Affiliation(s)
- Zhe He
- School of Information, Florida State University, United States
- Department of Behavioral Sciences and Social Medicine, College of Medicine, Florida State University, United States
| | - Emily Pfaff
- Department of Medicine, University of North Carolina at Chapel Hill School of Medicine, United States
| | - Serena Jingchuan Guo
- Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, United States
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, United States
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, United States
| | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, United States
| | - Gregor Stiglic
- Faculty of Health Science, University of Maribor, Slovenia
- Faculty of Electrical Engineering and Computer Science, University of Maribor, Slovenia
- Usher Institute, University of Edinburgh, UK
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, United States
| |
Collapse
|
4
|
Chan LE, Casiraghi E, Putman T, Reese J, Harmon QE, Schaper K, Hedge H, Valentini G, Schmitt C, Motsinger-Reif A, Hall JE, Mungall CJ, Robinson PN, Haendel MA. Predicting nutrition and environmental factors associated with female reproductive disorders using a knowledge graph and random forests. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.14.23292679. [PMID: 37502882 PMCID: PMC10371183 DOI: 10.1101/2023.07.14.23292679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Objective Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (e.g., endometriosis, ovarian cyst, and uterine fibroids). Materials and Methods We harmonized survey data from the Personalized Environment and Genes Study on internal and external environmental exposures and health conditions with biomedical ontology content. We merged the harmonized data and ontologies with supplemental nutrient and agricultural chemical data to create a KG. We analyzed the KG by embedding edges and applying a random forest for edge prediction to identify variables potentially associated with FRDs. We also conducted logistic regression analysis for comparison. Results Across 9765 PEGS respondents, the KG analysis resulted in 8535 significant predicted links between FRDs and chemicals, phenotypes, and diseases. Amongst these links, 32 were exact matches when compared with the logistic regression results, including comorbidities, medications, foods, and occupational exposures. Discussion Mechanistic underpinnings of predicted links documented in the literature may support some of our findings. Our KG methods are useful for predicting possible associations in large, survey-based datasets with added information on directionality and magnitude of effect from logistic regression. These results should not be construed as causal, but can support hypothesis generation. Conclusion This investigation enabled the generation of hypotheses on a variety of potential links between FRDs and exposures. Future investigations should prospectively evaluate the variables hypothesized to impact FRDs.
Collapse
Affiliation(s)
- Lauren E Chan
- Oregon State University, College of Public Health and Human Sciences, Corvallis, OR, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Tim Putman
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Quaker E Harmon
- National Institute of Environmental Health Sciences, Epidemiology Branch, Durham, NC, USA
| | - Kevin Schaper
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Harshad Hedge
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
| | - Charles Schmitt
- National Institute of Environmental Health Sciences, Office of Data Science, Durham, NC, USA
| | - Alison Motsinger-Reif
- National Institute of Environmental Health Sciences, Biostatistics & Computational Biology Branch, Durham, NC, USA
| | - Janet E Hall
- National Institute of Environmental Health Sciences, Clinical Research Branch, Durham, NC, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | | |
Collapse
|
5
|
Thessen AE, Cooper L, Swetnam TL, Hegde H, Reese J, Elser J, Jaiswal P. Using knowledge graphs to infer gene expression in plants. Front Artif Intell 2023; 6:1201002. [PMID: 37384147 PMCID: PMC10298150 DOI: 10.3389/frai.2023.1201002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 05/23/2023] [Indexed: 06/30/2023] Open
Abstract
Introduction Climate change is already affecting ecosystems around the world and forcing us to adapt to meet societal needs. The speed with which climate change is progressing necessitates a massive scaling up of the number of species with understood genotype-environment-phenotype (G×E×P) dynamics in order to increase ecosystem and agriculture resilience. An important part of predicting phenotype is understanding the complex gene regulatory networks present in organisms. Previous work has demonstrated that knowledge about one species can be applied to another using ontologically-supported knowledge bases that exploit homologous structures and homologous genes. These types of structures that can apply knowledge about one species to another have the potential to enable the massive scaling up that is needed through in silico experimentation. Methods We developed one such structure, a knowledge graph (KG) using information from Planteome and the EMBL-EBI Expression Atlas that connects gene expression, molecular interactions, functions, and pathways to homology-based gene annotations. Our preliminary analysis uses data from gene expression studies in Arabidopsis thaliana and Populus trichocarpa plants exposed to drought conditions. Results A graph query identified 16 pairs of homologous genes in these two taxa, some of which show opposite patterns of gene expression in response to drought. As expected, analysis of the upstream cis-regulatory region of these genes revealed that homologs with similar expression behavior had conserved cis-regulatory regions and potential interaction with similar trans-elements, unlike homologs that changed their expression in opposite ways. Discussion This suggests that even though the homologous pairs share common ancestry and functional roles, predicting expression and phenotype through homology inference needs careful consideration of integrating cis and trans-regulatory components in the curated and inferred knowledge graph.
Collapse
Affiliation(s)
- Anne E. Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Tyson L. Swetnam
- BIO5 Institute, University of Arizona, Tucson, AZ, United States
| | - Harshad Hegde
- Environmental Genomics and Systems Biology Division, Berkeley Lab (DOE), Berkeley, CA, United States
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Berkeley Lab (DOE), Berkeley, CA, United States
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| |
Collapse
|