1
|
Kejriwal M, Santos H, Shen K, Mulvehill AM, McGuinness DL. A noise audit of human-labeled benchmarks for machine commonsense reasoning. Sci Rep 2024; 14:8609. [PMID: 38615039 PMCID: PMC11016068 DOI: 10.1038/s41598-024-58937-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 04/04/2024] [Indexed: 04/15/2024] Open
Abstract
With the advent of large language models, evaluating and benchmarking these systems on important AI problems has taken on newfound importance. Such benchmarking typically involves comparing the predictions of a system against human labels (or a single 'ground-truth'). However, much recent work in psychology has suggested that most tasks involving significant human judgment can have non-trivial degrees of noise. In his book, Kahneman suggests that noise may be a much more significant component of inaccuracy compared to bias, which has been studied more extensively in the AI community. This article proposes a detailed noise audit of human-labeled benchmarks in machine commonsense reasoning, an important current area of AI research. We conduct noise audits under two important experimental conditions: one in a smaller-scale but higher-quality labeling setting, and another in a larger-scale, more realistic online crowdsourced setting. Using Kahneman's framework of noise, our results consistently show non-trivial amounts of level, pattern, and system noise, even in the higher-quality setting, with comparable results in the crowdsourced setting. We find that noise can significantly influence the performance estimates that we obtain of commonsense reasoning systems, even if the 'system' is a human; in some cases, by almost 10 percent. Labeling noise also affects performance estimates of systems like ChatGPT by more than 4 percent. Our results suggest that the default practice in the AI community of assuming and using a 'single' ground-truth, even on problems requiring seemingly straightforward human judgment, may warrant empirical and methodological re-visiting.
Collapse
Affiliation(s)
- Mayank Kejriwal
- Information Sciences Institute, University of Southern California, Marina del Rey, 90292, USA.
| | - Henrique Santos
- Rensselaer Polytechnic Institute, Tetherless World Constellation, Troy, New York, USA
| | - Ke Shen
- Information Sciences Institute, University of Southern California, Marina del Rey, 90292, USA
| | - Alice M Mulvehill
- Rensselaer Polytechnic Institute, Tetherless World Constellation, Troy, New York, USA
| | - Deborah L McGuinness
- Rensselaer Polytechnic Institute, Tetherless World Constellation, Troy, New York, USA
| |
Collapse
|
2
|
Santos H, Mulvehill AM, Shen K, Kejriwal M, McGuinness DL. TG-CSR: A human-labeled dataset grounded in nine formal commonsense categories. Data Brief 2023; 51:109666. [PMID: 37876745 PMCID: PMC10590714 DOI: 10.1016/j.dib.2023.109666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 10/05/2023] [Accepted: 10/06/2023] [Indexed: 10/26/2023] Open
Abstract
Machine Common Sense Reasoning is the subfield of Artificial Intelligence that aims to enable machines to behave or make decisions similarly to humans in everyday and ordinary situations. To measure progress, benchmarks in the form of question-answering datasets have been developed and published in the community to evaluate machine commonsense models, including large language models. We describe the individual label data produced by six human annotators originally used in computing ground truth for the Theoretically-Grounded Commonsense Reasoning (TG-CSR) benchmark's composing datasets. According to a set of instructions, annotators were provided with spreadsheets containing the original TG-CSR prompts and asked to insert labels in specific spreadsheet cells during annotation sessions. TG-CSR data is organized in JSON files, individual raw label data in a spreadsheet file, and individual normalized label data in JSONL files. The release of individual labels can enable the analysis of the labeling process itself, including studies of noise and consistency across annotators.
Collapse
Affiliation(s)
- Henrique Santos
- Rensselaer Polytechnic Institute 110 8th St., Troy, NY 12180, USA
| | | | - Ke Shen
- University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey CA, 90292, USA
| | - Mayank Kejriwal
- University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey CA, 90292, USA
| | | |
Collapse
|
3
|
Qi M, Santos H, Pinheiro P, McGuinness DL, Bennett KP. Demographic and socioeconomic determinants of access to care: A subgroup disparity analysis using new equity-focused measurements. PLoS One 2023; 18:e0290692. [PMID: 37972008 PMCID: PMC10653411 DOI: 10.1371/journal.pone.0290692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 08/15/2023] [Indexed: 11/19/2023] Open
Abstract
Disparities in healthcare access and utilization associated with demographic and socioeconomic status hinder advancement of health equity. Thus, we designed a novel equity-focused approach to quantify variations of healthcare access/utilization from the expectation in national target populations. We additionally applied survey-weighted logistic regression models, to identify factors associated with usage of a particular type of health care. To facilitate generation of analysis datasets, we built an National Health and Nutrition Examination Survey (NHANES) knowledge graph to help automate source-level dynamic analyses across different survey years and subjects' characteristics. We performed a cross-sectional subgroup disparity analysis of 2013-2018 NHANES on U.S. adults for receipt of diabetes treatments and vaccines against Hepatitis A (HAV), Hepatitis B (HBV), and Human Papilloma (HPV). Results show that in populations with hemoglobin A1c level ≥6%, patients with non-private insurance were less likely to receive newer and more beneficial antidiabetic medications; being Asian further exacerbated these disparities. For widely used drugs such as insulin, Asians experienced insignificant disparities in odds of prescription compared to White patients but received highly inadequate treatments with regard to their distribution in U.S. diabetic population. Vaccination rates were associated with some demographic/socioeconomic factors but not the others at different degrees for different diseases. For instance, while equity scores increase with rising education levels for HBV, they decrease with rising wealth levels for HPV. Among women vaccinated against HPV, minorities and poor communities usually received Cervarix while non-Hispanic White and higher-income groups received the more comprehensive Gardasil vaccine. Our study identified and quantified the impact of determinants of healthcare utilization for antidiabetic medications and vaccinations. Our new methods for semantics-aware disparity analysis of NHANES data could be readily generalized to other public health goals to support more rapid identification of disparities and development of policies, thus advancing health equity.
Collapse
Affiliation(s)
- Miao Qi
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, United States of America
| | - Henrique Santos
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, United States of America
| | - Paulo Pinheiro
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, United States of America
- Parcela Semântica Lda, Madeira, Portugal
| | - Deborah L. McGuinness
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, United States of America
| | - Kristin P. Bennett
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, United States of America
- Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, New York, United States of America
| |
Collapse
|
4
|
Seneviratne O, Das AK, Chari S, Agu NN, Rashid SM, McCusker J, Franklin JS, Qi M, Bennett KP, Chen CH, Hendler JA, McGuinness DL. Semantically enabling clinical decision support recommendations. J Biomed Semantics 2023; 14:8. [PMID: 37464259 DOI: 10.1186/s13326-023-00285-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 03/28/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Clinical decision support systems have been widely deployed to guide healthcare decisions on patient diagnosis, treatment choices, and patient management through evidence-based recommendations. These recommendations are typically derived from clinical practice guidelines created by clinical specialties or healthcare organizations. Although there have been many different technical approaches to encoding guideline recommendations into decision support systems, much of the previous work has not focused on enabling system generated recommendations through the formalization of changes in a guideline, the provenance of a recommendation, and applicability of the evidence. Prior work indicates that healthcare providers may not find that guideline-derived recommendations always meet their needs for reasons such as lack of relevance, transparency, time pressure, and applicability to their clinical practice. RESULTS We introduce several semantic techniques that model diseases based on clinical practice guidelines, provenance of the guidelines, and the study cohorts they are based on to enhance the capabilities of clinical decision support systems. We have explored ways to enable clinical decision support systems with semantic technologies that can represent and link to details in related items from the scientific literature and quickly adapt to changing information from the guidelines, identifying gaps, and supporting personalized explanations. Previous semantics-driven clinical decision systems have limited support in all these aspects, and we present the ontologies and semantic web based software tools in three distinct areas that are unified using a standard set of ontologies and a custom-built knowledge graph framework: (i) guideline modeling to characterize diseases, (ii) guideline provenance to attach evidence to treatment decisions from authoritative sources, and (iii) study cohort modeling to identify relevant research publications for complicated patients. CONCLUSIONS We have enhanced existing, evidence-based knowledge by developing ontologies and software that enables clinicians to conveniently access updates to and provenance of guidelines, as well as gather additional information from research studies applicable to their patients' unique circumstances. Our software solutions leverage many well-used existing biomedical ontologies and build upon decades of knowledge representation and reasoning work, leading to explainable results.
Collapse
Affiliation(s)
| | | | - Shruthi Chari
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | | | - Sabbir M Rashid
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | - Jamie McCusker
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | - Jade S Franklin
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | - Miao Qi
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | | | | | - James A Hendler
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | | |
Collapse
|
5
|
Chari S, Acharya P, Gruen DM, Zhang O, Eyigoz EK, Ghalwash M, Seneviratne O, Saiz FS, Meyer P, Chakraborty P, McGuinness DL. Informing clinical assessment by contextualizing post-hoc explanations of risk prediction models in type-2 diabetes. Artif Intell Med 2023; 137:102498. [PMID: 36868690 DOI: 10.1016/j.artmed.2023.102498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 11/21/2022] [Accepted: 01/18/2023] [Indexed: 02/05/2023]
Abstract
Medical experts may use Artificial Intelligence (AI) systems with greater trust if these are supported by 'contextual explanations' that let the practitioner connect system inferences to their context of use. However, their importance in improving model usage and understanding has not been extensively studied. Hence, we consider a comorbidity risk prediction scenario and focus on contexts regarding the patients' clinical state, AI predictions about their risk of complications, and algorithmic explanations supporting the predictions. We explore how relevant information for such dimensions can be extracted from Medical guidelines to answer typical questions from clinical practitioners. We identify this as a question answering (QA) task and employ several state-of-the-art Large Language Models (LLM) to present contexts around risk prediction model inferences and evaluate their acceptability. Finally, we study the benefits of contextual explanations by building an end-to-end AI pipeline including data cohorting, AI risk modeling, post-hoc model explanations, and prototyped a visual dashboard to present the combined insights from different context dimensions and data sources, while predicting and identifying the drivers of risk of Chronic Kidney Disease (CKD) - a common type-2 diabetes (T2DM) comorbidity. All of these steps were performed in deep engagement with medical experts, including a final evaluation of the dashboard results by an expert medical panel. We show that LLMs, in particular BERT and SciBERT, can be readily deployed to extract some relevant explanations to support clinical usage. To understand the value-add of the contextual explanations, the expert panel evaluated these regarding actionable insights in the relevant clinical setting. Overall, our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case. Our findings can help improve clinicians' usage of AI models.
Collapse
Affiliation(s)
- Shruthi Chari
- Rensselaer Polytechnic Institute, 110 8th St, Troy, 12180, NY, USA.
| | - Prasant Acharya
- Rensselaer Polytechnic Institute, 110 8th St, Troy, 12180, NY, USA
| | - Daniel M Gruen
- Rensselaer Polytechnic Institute, 110 8th St, Troy, 12180, NY, USA
| | - Olivia Zhang
- Center for Computational Health, IBM Research, 1101 Kitchawan Rd, Yorktown Heights, 10598, NY, USA
| | - Elif K Eyigoz
- Center for Computational Health, IBM Research, 1101 Kitchawan Rd, Yorktown Heights, 10598, NY, USA
| | - Mohamed Ghalwash
- Center for Computational Health, IBM Research, 1101 Kitchawan Rd, Yorktown Heights, 10598, NY, USA
| | | | | | - Pablo Meyer
- Center for Computational Health, IBM Research, 1101 Kitchawan Rd, Yorktown Heights, 10598, NY, USA
| | - Prithwish Chakraborty
- Center for Computational Health, IBM Research, 1101 Kitchawan Rd, Yorktown Heights, 10598, NY, USA
| | | |
Collapse
|
6
|
Bychkovsky BL, Agaoglu NB, Horton C, Zhou J, Yussuf A, Hemyari P, Richardson ME, Young C, LaDuca H, McGuinness DL, Scheib R, Garber JE, Rana HQ. Differences in Cancer Phenotypes Among Frequent CHEK2 Variants and Implications for Clinical Care-Checking CHEK2. JAMA Oncol 2022; 8:1598-1606. [PMID: 36136322 PMCID: PMC9501803 DOI: 10.1001/jamaoncol.2022.4071] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Importance Germline CHEK2 pathogenic variants (PVs) are frequently detected by multigene cancer panel testing (MGPT), but our understanding of PVs beyond c.1100del has been limited. Objective To compare cancer phenotypes of frequent CHEK2 PVs individually and collectively by variant type. Design, Setting, and Participants This retrospective cohort study was carried out in a single diagnostic testing laboratory from 2012 to 2019. Overall, 3783 participants with CHEK2 PVs identified via MGPT were included. Medical histories of cancer in participants with frequent PVs, negative MGPT (wild type), loss-of-function (LOF), and missense were compared. Main Outcomes and Measures Participants were stratified by CHEK2 PV type. Descriptive statistics were summarized including median (IQR) for continuous variables and proportions for categorical characteristics. Differences in age and proportions were assessed with Wilcoxon rank sum and Fisher exact tests, respectively. Frequencies, odds ratios (ORs), 95% confidence intervals were calculated, and P values were corrected for multiple comparisons where appropriate. Results Of the 3783 participants with CHEK2 PVs, 3473 (92%) were female and most reported White race. Breast cancer was less frequent in participants with p.I157T (OR, 0.66; 95% CI, 0.56-0.78; P<.001), p.S428F (OR, 0.59; 95% CI. 0.46-0.76; P<.001), and p.T476M (OR, 0.74; 95% CI, 0.56-0.98; P = .04) PVs compared with other PVs and an association with nonbreast cancers was not found. Following the exclusion of p.I157T, p.S428F, and p.T476M, participants with monoallelic CHEK2 PV had a younger age at first cancer diagnosis (P < .001) and were more likely to have breast (OR, 1.83; 95% CI, 1.66-2.02; P < .001), thyroid (OR, 1.63; 95% CI, 1.26-2.08; P < .001), and kidney cancer (OR, 2.57; 95% CI, 1.75-3.68; P < .001) than the wild-type cohort. Participants with a CHEK2 PV were less likely to have a diagnosis of colorectal cancer (OR, 0.62; 95% CI, 0.51-0.76; P < .001) compared with those in the wild-type cohort. There were no significant differences between frequent CHEK2 PVs and c.1100del and no differences between CHEK2 missense and LOF PVs. Conclusions and Relevance CHEK2 PVs, with few exceptions (p.I157T, p.S428F, and p.T476M), were associated with similar cancer phenotypes irrespective of variant type. CHEK2 PVs were not associated with colorectal cancer, but were associated with breast, kidney, and thyroid cancers. Compared with other CHEK2 PVs, the frequent p.I157T, p.S428F, and p.T476M alleles have an attenuated association with breast cancer and were not associated with nonbreast cancers. These data may inform the genetic counseling and care of individuals with CHEK2 PVs.
Collapse
Affiliation(s)
- Brittany L. Bychkovsky
- Division of Cancer Genetics and Prevention, Dana-Farber Cancer Institute, Boston, Massachusetts,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts,Harvard Medical School, Boston, Massachusetts
| | - Nihat B. Agaoglu
- Division of Cancer Genetics and Prevention, Dana-Farber Cancer Institute, Boston, Massachusetts,Department of Medical Genetics, Umraniye Training and Research Hospital, İstanbul, Turkey
| | | | - Jing Zhou
- Ambry Genetics, Aliso Viejo, California
| | | | | | | | | | | | | | - Rochelle Scheib
- Division of Cancer Genetics and Prevention, Dana-Farber Cancer Institute, Boston, Massachusetts,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts,Harvard Medical School, Boston, Massachusetts
| | - Judy E. Garber
- Division of Cancer Genetics and Prevention, Dana-Farber Cancer Institute, Boston, Massachusetts,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts,Harvard Medical School, Boston, Massachusetts
| | - Huma Q. Rana
- Division of Cancer Genetics and Prevention, Dana-Farber Cancer Institute, Boston, Massachusetts,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts,Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
7
|
Kejriwal M, Santos H, Mulvehill AM, McGuinness DL. Designing a strong test for measuring true common-sense reasoning. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00478-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
8
|
Shirai SS, Seneviratne O, Gordon ME, Chen CH, McGuinness DL. Identifying Ingredient Substitutions Using a Knowledge Graph of Food. Front Artif Intell 2021; 3:621766. [PMID: 33733228 PMCID: PMC7861309 DOI: 10.3389/frai.2020.621766] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 12/03/2020] [Indexed: 11/13/2022] Open
Abstract
People can affect change in their eating patterns by substituting ingredients in recipes. Such substitutions may be motivated by specific goals, like modifying the intake of a specific nutrient or avoiding a particular category of ingredients. Determining how to modify a recipe can be difficult because people need to 1) identify which ingredients can act as valid replacements for the original and 2) figure out whether the substitution is “good” for their particular context, which may consider factors such as allergies, nutritional contents of individual ingredients, and other dietary restrictions. We propose an approach to leverage both explicit semantic information about ingredients, encapsulated in a knowledge graph of food, and implicit semantics, captured through word embeddings, to develop a substitutability heuristic to rank plausible substitute options automatically. Our proposed system also helps determine which ingredient substitution options are “healthy” using nutritional information and food classification constraints. We evaluate our substitutability heuristic, diet-improvement ingredient substitutability heuristic (DIISH), using a dataset of ground-truth substitutions scraped from ingredient substitution guides and user reviews of recipes, demonstrating that our approach can help reduce the human effort required to make recipes more suitable for specific dietary needs.
Collapse
Affiliation(s)
- Sola S Shirai
- Rensselaer Polytechnic Institute, Troy, NY, United States
| | | | - Minor E Gordon
- Rensselaer Polytechnic Institute, Troy, NY, United States
| | - Ching-Hua Chen
- IBM T. J. Watson Research Center, Yorktown Heights, NY, United States
| | | |
Collapse
|
9
|
Franklin JDS, Chari S, Foreman MA, Seneviratne O, Gruen DM, McCusker JP, Das AK, McGuinness DL. Knowledge Extraction of Cohort Characteristics in Research Publications. AMIA Annu Symp Proc 2021; 2020:462-471. [PMID: 33936419 PMCID: PMC8075436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
When healthcare providers review the results of a clinical trial study to understand its applicability to their practice, they typically analyze how well the characteristics of the study cohort correspond to those of the patients they see. We have previously created a study cohort ontology to standardize this information and make it accessible for knowledge-based decision support. The extraction of this information from research publications is challenging, however, given the wide variance in reporting cohort characteristics in a tabular representation. To address this issue, we have developed an ontology-enabled knowledge extraction pipeline for automatically constructing knowledge graphs from the cohort characteristics found in PDF-formatted research papers. We evaluated our approach using a training and test set of 41 research publications and found an overall accuracy of 83.3% in correctly assembling the knowledge graphs. Our research provides a promising approach for extracting knowledge more broadly from tabular information in research publications.
Collapse
|
10
|
Thessen AE, Walls RL, Vogt L, Singer J, Warren R, Buttigieg PL, Balhoff JP, Mungall CJ, McGuinness DL, Stucky BJ, Yoder MJ, Haendel MA. Transforming the study of organisms: Phenomic data models and knowledge bases. PLoS Comput Biol 2020; 16:e1008376. [PMID: 33232313 PMCID: PMC7685442 DOI: 10.1371/journal.pcbi.1008376] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem.
Collapse
Affiliation(s)
- Anne E. Thessen
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
- Ronin Institute for Independent Scholarship, Monclair, New Jersey, United States of America
| | - Ramona L. Walls
- Bio5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
| | | | | | - Pier Luigi Buttigieg
- Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Bremerhaven, Germany
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | | | - Brian J. Stucky
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, Champaign, Illinois, United States of America
| | - Melissa A. Haendel
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
11
|
Rashid SM, McCusker JP, Pinheiro P, Bax MP, Santos H, Stingone JA, Das AK, McGuinness DL. The Semantic Data Dictionary - An Approach for Describing and Annotating Data. Data Intell 2020; 2:443-486. [PMID: 33103120 DOI: 10.1162/dint_a_00058] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
It is common practice for data providers to include text descriptions for each column when publishing datasets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a dataset, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse datasets. In this paper, we present our Semantic Data Dictionary work in the context of our work with biomedical data; however, the approach can and has been used in a wide range of domains. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey dataset, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large NIH-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project. We evaluate this work in comparison with traditional data dictionaries, mapping languages, and data integration tools.
Collapse
Affiliation(s)
| | | | | | - Marcello P Bax
- Universidade Federal de Minas Gerais, Belo Horizonte, MG, 31270-901, BR
| | | | - Jeanette A Stingone
- Columbia University, Mailman School of Public Health, New York, NY, 10032, USA
| | | | | |
Collapse
|
12
|
Brinson LC, Deagen M, Chen W, McCusker J, McGuinness DL, Schadler LS, Palmeri M, Ghumman U, Lin A, Hu B. Polymer Nanocomposite Data: Curation, Frameworks, Access, and Potential for Discovery and Design. ACS Macro Lett 2020; 9:1086-1094. [PMID: 35653211 DOI: 10.1021/acsmacrolett.0c00264] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
With the advent of the materials genome initiative (MGI) in the United States and a similar focus on materials data around the world, a number of materials data resources and associated vocabularies, tools, and repositories have been developed. While the majority of systems focus on slices of computational data with an emphasis on metallic alloys, NanoMine is an open source platform with the goal of curating and storing widely varying experimental data on polymer nanocomposites (polymers doped with nanoparticles) and providing access to characterization and analysis tools with the long-term objective of promoting facile nanocomposite design. Data on over 2500 samples from the literature and individual laboratories has been curated to date into NanoMine, including 230 samples from the papers bound in this virtual issue. This virtual issue represents an experiment of the flexibility of the data repository to capture the unique experimental metadata requirements of many data sets at one time and to challenge the authors to participate in the curation of their research data associated with a given publication. In principle, NanoMine offers a FAIR platform in which data published in papers becomes directly Findable and Accessible via simple search tools, with open metadata standards that are Interoperable with larger materials data registries, and allows easy Reuse of data, e.g. benchmarking against new results. Our hope is that with time, platforms such as this one could capture much of the newly published data on materials and form nodes in an interconnected materials data ecosystem which would allow researchers to robustly archive their data, add to the growing body of readily accessible data, and enable new forms of discovery by application of data analysis and design tools.
Collapse
Affiliation(s)
- L Catherine Brinson
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| | - Michael Deagen
- Department of Mechanical Engineering, University of Vermont, Burlington, Vermont 05405, United States
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - James McCusker
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - Deborah L McGuinness
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - Linda S Schadler
- Department of Mechanical Engineering, University of Vermont, Burlington, Vermont 05405, United States
| | - Marc Palmeri
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| | - Umar Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - Anqi Lin
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| | - Bingyin Hu
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| |
Collapse
|
13
|
Vargason T, Frye RE, McGuinness DL, Hahn J. Clustering of co-occurring conditions in autism spectrum disorder during early childhood: A retrospective analysis of medical claims data. Autism Res 2019; 12:1272-1285. [PMID: 31149786 DOI: 10.1002/aur.2128] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 03/20/2019] [Accepted: 05/05/2019] [Indexed: 12/18/2022]
Abstract
Individuals with autism spectrum disorder (ASD) are frequently affected by co-occurring medical conditions (COCs), which vary in severity, age of onset, and pathophysiological characteristics. The presence of COCs contributes to significant heterogeneity in the clinical presentation of ASD between individuals and a better understanding of COCs may offer greater insight into the etiology of ASD in specific subgroups while also providing guidance for diagnostic and treatment protocols. This study retrospectively analyzed medical claims data from a private United States health plan between years 2000 and 2015 to investigate patterns of COC diagnoses in a cohort of 3,278 children with ASD throughout their first 5 years of enrollment compared to 279,693 children from the general population without ASD diagnoses (POP cohort). Three subgroups of children with ASD were identified by k-means clustering using these COC patterns. The first cluster was characterized by generally high rates of COC diagnosis and comprised 23.7% (n = 776) of the cohort. Diagnoses of developmental delays were dominant in the second cluster containing 26.5% (n = 870) of the cohort. Children in the third cluster, making up 49.8% (n = 1,632) of the cohort, had the lowest rates of COC diagnosis, which were slightly higher than rates observed in the POP cohort. A secondary analysis using these data found that gastrointestinal and immune disorders showed similar longitudinal patterns of prevalence, as did seizure and sleep disorders. These findings may help to better inform the development of diagnostic workup and treatment protocols for COCs in children with ASD. Autism Res 2019, 12: 1272-1285. © 2019 International Society for Autism Research, Wiley Periodicals, Inc. LAY SUMMARY: Medical conditions that co-occur with autism spectrum disorder (ASD) vary significantly from person to person. This study analyzed patterns in diagnosis of co-occurring conditions from medical claims data and observed three subtypes of children with ASD. These results may aid with screening for co-occurring conditions in children with ASD and with understanding ASD subtypes.
Collapse
Affiliation(s)
- Troy Vargason
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, New York.,Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York.,OptumLabs Visiting Fellow, Cambridge, Massachusetts
| | - Richard E Frye
- Department of Child Health, University of Arizona College of Medicine, Phoenix, Arizona.,Phoenix Children's Hospital, Phoenix, Arizona
| | - Deborah L McGuinness
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York.,Department of Cognitive Science, Rensselaer Polytechnic Institute, Troy, New York
| | - Juergen Hahn
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, New York.,Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York.,Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, Troy, New York
| |
Collapse
|
14
|
Vargason T, McGuinness DL, Hahn J. Gastrointestinal Symptoms and Oral Antibiotic Use in Children with Autism Spectrum Disorder: Retrospective Analysis of a Privately Insured U.S. Population. J Autism Dev Disord 2019; 49:647-659. [PMID: 30178105 DOI: 10.1007/s10803-018-3743-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
A retrospective analysis of administrative claims data from a large U.S. health insurer was performed to study a potential association between oral antibiotic use during early childhood and occurrence of later gastrointestinal (GI) symptoms in children with autism spectrum disorder (ASD). Among 3253 children with ASD, 37.0% had a GI-related diagnosis during the last 2 years of their 5-year health coverage enrollment period, compared to 20.0% of 278,370 children from the general population without an ASD diagnosis. Greater numbers of oral antibiotic fills during the first 3 years of enrollment were found to significantly increase the hazard rate of having a later GI-related diagnosis (adjusted hazard ratio 1.48; 95% confidence interval 1.34, 1.63) in children both with and without ASD.
Collapse
Affiliation(s)
- Troy Vargason
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, NY, 12180, USA.,OptumLabs Visiting Fellow, Rensselaer Polytechnic Institute, Troy, NY, USA
| | | | - Juergen Hahn
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, NY, 12180, USA. .,Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA.
| |
Collapse
|
15
|
Vargason T, Kruger U, McGuinness DL, Adams JB, Geis E, Gehn E, Coleman D, Hahn J. Investigating Plasma Amino Acids for Differentiating Individuals with Autism Spectrum Disorder and Typically Developing Peers. Res Autism Spectr Disord 2018; 50:60-72. [PMID: 29682004 PMCID: PMC5903290 DOI: 10.1016/j.rasd.2018.03.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
BACKGROUND Plasma amino acid measurements have been extensively investigated in individuals with autism spectrum disorder (ASD). Results thus far have been inconclusive as studies generally disagree on which amino acids are different in individuals with ASD versus their typically developing (TD) peers, due in part to methodological limitations of several studies. METHOD This paper investigates plasma amino acids in children and adults with ASD using data from Arizona State University's Comprehensive Nutritional and Dietary Intervention Study. Measurements from 64 individuals with ASD and 49 TD controls were analyzed using univariate and multivariate statistical techniques. RESULTS Univariate analysis indicated increased median levels of glutamate (+21%, p=0.014) and serine (+8%, p=0.043), and increased mean levels of hydroxyproline (+17%, p=0.018) for the ASD cohort, although these differences were insignificant after correcting for multiple comparisons. A multivariate approach was used to classify study participants into ASD/TD cohorts using Fisher discriminant analysis (FDA) and its nonlinear extension, kernel Fisher discriminant analysis (KFDA). Model fitting with FDA using all available measurements produced Type I and Type II errors of 27.0% and 27.8%, respectively. KFDA was most effective when using hydroxyproline, leucine, and threonine as inputs; however, leave-one-out cross-validation with this nonlinear model only resulted in 70.3% sensitivity and 77.6% specificity. CONCLUSIONS The finding of elevated glutamate in ASD is in agreement with several other studies. Overall, however, these results suggest that plasma amino acid measurements are of limited use for purposes of ASD classification, which may explain some of the inconsistencies in results presented in the literature.
Collapse
Affiliation(s)
- Troy Vargason
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Uwe Kruger
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
| | | | - James B. Adams
- Autism/Asperger’s Research Program, Arizona State University, Tempe, AZ, USA
| | - Elizabeth Geis
- Autism/Asperger’s Research Program, Arizona State University, Tempe, AZ, USA
| | - Eva Gehn
- Autism/Asperger’s Research Program, Arizona State University, Tempe, AZ, USA
| | - Devon Coleman
- Autism/Asperger’s Research Program, Arizona State University, Tempe, AZ, USA
| | - Juergen Hahn
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
- Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
- Corresponding author at: Department of Biomedical Engineering, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, United States of America,
| |
Collapse
|
16
|
McCusker JP, Dumontier M, Yan R, He S, Dordick JS, McGuinness DL. Finding melanoma drugs through a probabilistic knowledge graph. PeerJ Comput Sci 2017; 3:e106. [PMID: 37133296 PMCID: PMC10151034 DOI: 10.7717/peerj-cs.106] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 12/27/2016] [Indexed: 05/04/2023]
Abstract
Metastatic cutaneous melanoma is an aggressive skin cancer with some progression-slowing treatments but no known cure. The omics data explosion has created many possible drug candidates; however, filtering criteria remain challenging, and systems biology approaches have become fragmented with many disconnected databases. Using drug, protein and disease interactions, we built an evidence-weighted knowledge graph of integrated interactions. Our knowledge graph-based system, ReDrugS, can be used via an application programming interface or web interface, and has generated 25 high-quality melanoma drug candidates. We show that probabilistic analysis of systems biology graphs increases drug candidate quality compared to non-probabilistic methods. Four of the 25 candidates are novel therapies, three of which have been tested with other cancers. All other candidates have current or completed clinical trials, or have been studied in in vivo or in vitro. This approach can be used to identify candidate therapies for use in research or personalized medicine.
Collapse
Affiliation(s)
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Rui Yan
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Sylvia He
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Jonathan S. Dordick
- Department of Chemical & Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
- Center for Biotechnology & Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Deborah L. McGuinness
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
- Center for Biotechnology & Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
17
|
Hussain S, Sun H, Sinaci A, Erturkmen GBL, Mead C, Gray AJG, McGuinness DL, Prud'Hommeaux E, Daniel C, Forsberg K. A framework for evaluating and utilizing medical terminology mappings. Stud Health Technol Inform 2014; 205:594-598. [PMID: 25160255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Use of medical terminologies and mappings across them are considered to be crucial pre-requisites for achieving interoperable eHealth applications. Built upon the outcomes of several research projects, we introduce a framework for evaluating and utilizing terminology mappings that offers a platform for i) performing various mappings strategies, ii) representing terminology mappings together with their provenance information, and iii) enabling terminology reasoning for inferring both new and erroneous mappings. We present the results of the introduced framework from SALUS project where we evaluated the quality of both existing and inferred terminology mappings among standard terminologies.
Collapse
Affiliation(s)
| | - Hong Sun
- Advanced Clinical Applications Research Group, Agfa HealthCare, Gent, Belgium
| | - Anil Sinaci
- Software Research, Development and Consultancy Ltd., Turkey
| | | | | | - Alasdair J G Gray
- School of Mathematical and Computer Sciences, Heriot-Watt University, UK
| | | | | | | | | |
Collapse
|
18
|
Zhang XS, Shrestha B, Yoon S, Kambhampati S, DiBona P, Guo JK, McFarlane D, Hofmann MO, Whitebread K, Appling DS, Whitaker ET, Trewhitt EB, Ding L, Michaelis JR, McGuinness DL, Hendler JA, Doppa JR, Parker C, Dietterich TG, Tadepalli P, Wong WK, Green D, Rebguns A, Spears D, Kuter U, Levine G, DeJong G, MacTavish RL, Ontañón S, Radhakrishnan J, Ram A, Mostafa H, Zafar H, Zhang C, Corkill D, Lesser V, Song Z. An Ensemble Architecture for Learning Complex Problem-Solving Techniques from Demonstration. ACM T INTEL SYST TEC 2012. [DOI: 10.1145/2337542.2337560] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We present a novel ensemble architecture for learning problem-solving techniques from a very small number of expert solutions and demonstrate its effectiveness in a complex real-world domain. The key feature of our “Generalized Integrated Learning Architecture” (GILA) is a set of heterogeneous independent learning and reasoning (ILR) components, coordinated by a central meta-reasoning executive (MRE). The ILRs are
weakly coupled
in the sense that all coordination during learning and performance happens through the MRE. Each ILR learns independently from a small number of expert demonstrations of a complex task. During performance, each ILR proposes partial solutions to subproblems posed by the MRE, which are then selected from and pieced together by the MRE to produce a complete solution. The heterogeneity of the learner-reasoners allows both learning and problem solving to be more effective because their abilities and biases are complementary and synergistic. We describe the application of this novel learning and problem solving architecture to the domain of airspace management, where multiple requests for the use of airspaces need to be deconflicted, reconciled, and managed automatically. Formal evaluations show that our system performs as well as or better than humans after learning from the same training data. Furthermore, GILA outperforms any individual ILR run in isolation, thus demonstrating the power of the ensemble architecture for learning and problem solving.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Li Ding
- Rensselaer Polytechnic Institute
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Blisard S, Carmichael T, Ding L, Finin T, Frost W, Graesser A, Hadzikadic M, Kagal L, Kruijff GJM, Langley P, Lester J, McGuinness DL, Mostow J, Papadakis P, Pirri F, Prasad R, Stoyanchev S, Varakantham P. Reports of the AAAI 2011 Fall Symposia. AI MAG 2012. [DOI: 10.1609/aimag.v33i1.2391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
The Association for the Advancement of Artificial Intelligence was pleased to present the 2011 Fall Symposium Series, held Friday through Sunday, November 4–6, at the Westin Arlington Gateway in Arlington, Virginia. The titles of the seven symposia are as follows: (1) Advances in Cognitive Systems; (2) Building Representations of Common Ground with Intelligent Agents; (3) Complex Adaptive Systems: Energy, Information and Intelligence; (4) Multiagent Coordination under Uncertainty; (5) Open Government Knowledge: AI Opportunities and Challenges; (6) Question Generation; and (7) Robot-Human Teamwork in Dynamic Adverse Environment. The highlights of each symposium are presented in this report.
Collapse
|
20
|
Luciano JS, Andersson B, Batchelor C, Bodenreider O, Clark T, Denney CK, Domarew C, Gambet T, Harland L, Jentzsch A, Kashyap V, Kos P, Kozlovsky J, Lebo T, Marshall SM, McCusker JP, McGuinness DL, Ogbuji C, Pichler E, Powers RL, Prud'hommeaux E, Samwald M, Schriml L, Tonellato PJ, Whetzel PL, Zhao J, Stephens S, Dumontier M. The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside. J Biomed Semantics 2011; 2 Suppl 2:S1. [PMID: 21624155 DOI: 10.1186/2041-1480-2-s2-s1] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Translational medicine requires the integration of knowledge using heterogeneous data from health care to the life sciences. Here, we describe a collaborative effort to produce a prototype Translational Medicine Knowledge Base (TMKB) capable of answering questions relating to clinical practice and pharmaceutical drug discovery. RESULTS We developed the Translational Medicine Ontology (TMO) as a unifying ontology to integrate chemical, genomic and proteomic data with disease, treatment, and electronic health records. We demonstrate the use of Semantic Web technologies in the integration of patient and biomedical data, and reveal how such a knowledge base can aid physicians in providing tailored patient care and facilitate the recruitment of patients into active clinical trials. Thus, patients, physicians and researchers may explore the knowledge base to better understand therapeutic options, efficacy, and mechanisms of action. CONCLUSIONS This work takes an important step in using Semantic Web technologies to facilitate integration of relevant, distributed, external sources and progress towards a computational platform to support personalized medicine. AVAILABILITY TMO can be downloaded from http://code.google.com/p/translationalmedicineontology and TMKB can be accessed at http://tm.semanticscience.org/sparql.
Collapse
|
21
|
Barkowsky T, Bertel S, Broz F, Chaudhri VK, Eagle N, Genesereth M, Halpin H, Hamner E, Hoffmann G, Hölscher C, Horvitz E, Lauwers T, McGuinness DL, Michalowski M, Mower E, Shipley TF, Stubbs K, Vogl R, Williams MA. Reports of the AAAI 2010 Spring Symposia. AI MAG 2010. [DOI: 10.1609/aimag.v31i3.2304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford University’s Department of Computer Science, is pleased to present the 2010 Spring Symposium Series, to be held Monday through Wednesday, March 22–24, 2010 at Stanford University. The titles of the seven symposia are Artificial Intelligence for Development; Cognitive Shape Processing; Educational Robotics and Beyond: Design and Evaluation; Embedded Reasoning: Intelligence in Embedded Systems Intelligent Information Privacy Management; It’s All in the Timing: Representing and Reasoning about Time in Interactive Behavior; and Linked Data Meets Artificial Intelligence.
Collapse
|
22
|
Halpin H, Hayes PJ, McCusker JP, McGuinness DL, Thompson HS. When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. Lecture Notes in Computer Science 2010. [DOI: 10.1007/978-3-642-17746-0_20] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
23
|
Balduccini M, Baral C, Brodaric B, Colton S, Fox P, Gutelius D, Hinkelmann K, Horswill I, Huberman B, Hudlicka E, Lerman K, Lisetti C, McGuinness DL, Maher ML, Musen MA, Sahami M, Sleeman D, Thönssen B, Velasquez JD, Ventura D. AAAI 2008 Spring Symposia Reports. AI MAG 2008. [DOI: 10.1609/aimag.v29i3.2148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
The Association for the Advancement of Artificial Intelligence (AAAI) was pleased to present the AAAI 2008 Spring Symposium Series, held Wednesday through Friday, March 26–28, 2008 at Stanford University, California. The titles of the eight symposia were as follows: (1) AI Meets Business Rules and Process Management, (2) Architectures for Intelligent Theory-Based Agents, (3) Creative Intelligent Systems, (4) Emotion, Personality, and Social Behavior, (5) Semantic Scientific Knowledge Integration, (6) Social Information Processing, (7) Symbiotic Relationships between Semantic Web and Knowledge Engineering, (8) Using AI to Motivate Greater Participation in Computer Science The goal of the AI Meets Business Rules and Process Management AAAI symposium was to investigate the various approaches and standards to represent business rules, business process management and the semantic web with respect to expressiveness and reasoning capabilities. The focus of the Architectures for Intelligent Theory-Based Agents AAAI symposium was the definition of architectures for intelligent theory-based agents, comprising languages, knowledge representation methodologies, reasoning algorithms, and control loops. The Creative Intelligent Systems Symposium included five major discussion sessions and a general poster session (in which all contributing papers were presented). The purpose of this symposium was to explore the synergies between creative cognition and intelligent systems. The goal of the Emotion, Personality, and Social Behavior symposium was to examine fundamental issues in affect and personality in both biological and artificial agents, focusing on the roles of these factors in mediating social behavior. The Semantic Scientific Knowledge Symposium was interested in bringing together the semantic technologies community with the scientific information technology community in an effort to build the general semantic science information community. The Social Information Processing's goal was to investigate computational and analytic approaches that will enable users to harness the efforts of large numbers of other users to solve a variety of information processing problems, from discovering high-quality content to managing common resources. The goal of the Symbiotic Relationships between the Semantic Web and Software Engineering symposium was to explore how the lessons learned by the knowledge-engineering community over the past three decades could be applied to the bold research agenda of current workers in semantic web technologies. The purpose of the Using AI to Motivate Greater Participation in Computer Science symposium was to identify ways that topics in AI may be used to motivate greater student participation in computer science by highlighting fun, engaging, and intellectually challenging developments in AI-related curriculum at a number of educational levels. Technical reports of the symposia were published by AAAI Press.
Collapse
|
24
|
|
25
|
Sathiamurthy M, Peters B, Bui HH, Sidney J, Mokili J, Wilson SS, Fleri W, McGuinness DL, Bourne PE, Sette A. An ontology for immune epitopes: application to the design of a broad scope database of immune reactivities. Immunome Res 2005; 1:2. [PMID: 16305755 PMCID: PMC1287064 DOI: 10.1186/1745-7580-1-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2005] [Accepted: 09/20/2005] [Indexed: 11/25/2022] Open
Abstract
Background Epitopes can be defined as the molecular structures bound by specific receptors, which are recognized during immune responses. The Immune Epitope Database and Analysis Resource (IEDB) project will catalog and organize information regarding antibody and T cell epitopes from infectious pathogens, experimental antigens and self-antigens, with a priority on NIAID Category A-C pathogens () and emerging/re-emerging infectious diseases. Both intrinsic structural and phylogenetic features, as well as information relating to the interactions of the epitopes with the host's immune system will be catalogued. Description To effectively represent and communicate the information related to immune epitopes, a formal ontology was developed. The semantics of the epitope domain and related concepts were captured as a hierarchy of classes, which represent the general and specialized relationships between the various concepts. A complete listing of classes and their properties can be found at . Conclusion The IEDB's ontology is the first ontology specifically designed to capture both intrinsic chemical and biochemical information relating to immune epitopes with information relating to the interaction of these structures with molecules derived from the host immune system. We anticipate that the development of this type of ontology and associated databases will facilitate rigorous description of data related to immune epitopes, and might ultimately lead to completely new methods for describing and modeling immune responses.
Collapse
Affiliation(s)
- Muthuraman Sathiamurthy
- La Jolla Institute of Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, 92109, USA
| | - Bjoern Peters
- La Jolla Institute of Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, 92109, USA
| | - Huynh-Hoa Bui
- La Jolla Institute of Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, 92109, USA
| | - John Sidney
- La Jolla Institute of Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, 92109, USA
| | - John Mokili
- La Jolla Institute of Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, 92109, USA
| | - Stephen S Wilson
- La Jolla Institute of Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, 92109, USA
| | - Ward Fleri
- La Jolla Institute of Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, 92109, USA
| | - Deborah L McGuinness
- Knowledge Systems, Artificial Intelligence Laboratory, Stanford University and McGuinness Associates, Stanford, CA 94305, USA
| | - Philip E Bourne
- San Diego Supercomputer Center, P.O. Box 85608, San Diego, California 92186-5608, USA
| | - Alessandro Sette
- La Jolla Institute of Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, 92109, USA
| |
Collapse
|
26
|
Abstract
Our work on the CLASSIC knowledge representation system covers a broad range from theory to practice. While CLASSIC was implemented primarily to provide a simple, easy to learn and use, locally available tool for a relatively limited set of applications, it has a substantial theoretical foundation, based on a formal "terminological" logic. The logical foundation provides the semantics of a term description language, which is used to define structured concepts and make assertions about individuals in a knowledge base. These concepts and individuals are organized into a generalization hierarchy by classification and subsumption algorithms. The CLASSIC system explores the expressiveness vs. tractability tradeoff, driven by concerns of usefulness and usability in several real applications. Within this context, it embodies our views of what a knowledge representation system should
Collapse
|