1
|
Turki H, Jemielniak D, Hadj Taieb MA, Labra Gayo JE, Ben Aouicha M, Banat M, Shafee T, Prud’hommeaux E, Lubiana T, Das D, Mietchen D. Using logical constraints to validate statistical information about disease outbreaks in collaborative knowledge graphs: the case of COVID-19 epidemiology in Wikidata. PeerJ Comput Sci 2022; 8:e1085. [PMID: 36262159 PMCID: PMC9575845 DOI: 10.7717/peerj-cs.1085] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 08/15/2022] [Indexed: 06/16/2023]
Abstract
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated using validation schemas and bot edits. In this research article, we catalog an automatable task set necessary to assess and validate the portion of Wikidata relating to the COVID-19 epidemiology. These tasks assess statistical data and are implemented in SPARQL, a query language for semantic databases. We demonstrate the efficiency of our methods for evaluating structured non-relational information on COVID-19 in Wikidata, and its applicability in collaborative ontologies and knowledge graphs more broadly. We show the advantages and limitations of our proposed approach by comparing it to the features of other methods for the validation of linked web data as revealed by previous research.
Collapse
Affiliation(s)
- Houcemeddine Turki
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Dariusz Jemielniak
- Department of Management in Networked and Digital Societies, Kozminski University, Warsaw, Masovia, Poland
| | - Mohamed A. Hadj Taieb
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Jose E. Labra Gayo
- Web Semantics Oviedo (WESO) Research Group, University of Oviedo, Oviedo, Asturias, Spain
| | - Mohamed Ben Aouicha
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Mus’ab Banat
- Faculty of Medicine, Hashemite University, Zarqa, Jordan
| | - Thomas Shafee
- La Trobe University, Melbourne, Victoria, Australia
- Swinburne University of Technology, Melbourne, Victoria, Australia
| | - Eric Prud’hommeaux
- World Wide Web Consortium, Cambridge, Massachusetts, United States of America
| | - Tiago Lubiana
- Computational Systems Biology Laboratory, University of São Paulo, São Paulo, Brazil
| | - Diptanshu Das
- Institute of Child Health (ICH), Kolkata, West Bengal, India
- Medica Superspecialty Hospital, Kolkata, West Bengal, India
| | - Daniel Mietchen
- Ronin Institute, Montclair, New Jersey, United States of America
- Department of Evolutionary and Integrative Ecology, Leibniz Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States
- Institute for Globally Distributed Open Research and Education (IGDORE), Jena, Germany
| |
Collapse
|
2
|
Ramsey J, McIntosh B, Renfro D, Aleksander SA, LaBonte S, Ross C, Zweifel AE, Liles N, Farrar S, Gill JJ, Erill I, Ades S, Berardini TZ, Bennett JA, Brady S, Britton R, Carbon S, Caruso SM, Clements D, Dalia R, Defelice M, Doyle EL, Friedberg I, Gurney SMR, Hughes L, Johnson A, Kowalski JM, Li D, Lovering RC, Mans TL, McCarthy F, Moore SD, Murphy R, Paustian TD, Perdue S, Peterson CN, Prüß BM, Saha MS, Sheehy RR, Tansey JT, Temple L, Thorman AW, Trevino S, Vollmer AC, Walbot V, Willey J, Siegele DA, Hu JC. Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO). PLoS Comput Biol 2021; 17:e1009463. [PMID: 34710081 PMCID: PMC8553046 DOI: 10.1371/journal.pcbi.1009463] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.
Collapse
Affiliation(s)
- Jolene Ramsey
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
- Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
| | - Brenley McIntosh
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Daniel Renfro
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Suzanne A. Aleksander
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Sandra LaBonte
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Curtis Ross
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
- Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
| | - Adrienne E. Zweifel
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Nathan Liles
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Shabnam Farrar
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
| | - Jason J. Gill
- Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
- Department of Animal Science, Texas A&M University, College Station, Texas, United States of America
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
- Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
| | - Sarah Ades
- Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Tanya Z. Berardini
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Newark, California, United States of America
| | - Jennifer A. Bennett
- Department of Biology and Earth Science, Otterbein University, Westerville, Ohio, United States of America
| | - Siobhan Brady
- Department of Plant Biology and Genome Center, University of California Davis, Davis, California, United States of America
| | - Robert Britton
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Steven M. Caruso
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
| | - Dave Clements
- Department of Biology, John Hopkins University, Baltimore, Maryland, United States of America
| | - Ritu Dalia
- Department of Biology, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Meredith Defelice
- Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Erin L. Doyle
- Biology Department, Doane University, Crete, Nebraska, United States of America
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
| | - Susan M. R. Gurney
- Department of Biology, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Lee Hughes
- Department of Biological Sciences, University of North Texas, Denton, Texas, United States of America
| | - Allison Johnson
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Jason M. Kowalski
- Biological Sciences Department, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America
| | - Donghui Li
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Newark, California, United States of America
| | - Ruth C. Lovering
- Institute of Cardiovascular Science, University College London, London, United Kingdom
| | - Tamara L. Mans
- Department of Biochemistry and Biotechnology, Minnesota State University Moorhead, Brooklyn Park, Minnesota, United States of America
| | - Fiona McCarthy
- Department of Basic Science, College of Veterinary Medicine, Mississippi State University, Starkville, Mississippi, United States of America
| | - Sean D. Moore
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, Florida, United States of America
| | - Rebecca Murphy
- Department of Biology, Centenary College of Louisiana, Shreveport, Louisiana, United States of America
| | - Timothy D. Paustian
- Department of Bacteriology, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Sarah Perdue
- Biological Sciences Department, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America
| | - Celeste N. Peterson
- Biology Department, Suffolk University, Boston, Massachusetts, United States of America
| | - Birgit M. Prüß
- Microbiological Sciences Department, North Dakota State University, Fargo, North Dakota, United States of America
| | - Margaret S. Saha
- Department of Biology, College of William & Mary, Williamsburg, Virginia, United States of America
| | - Robert R. Sheehy
- Biology Department, Radford University, Radford, Virginia, United States of America
| | - John T. Tansey
- Department of Biochemistry and Molecular Biology, Otterbein University, Westerville, Ohio, United States of America
| | - Louise Temple
- School of Integrated Sciences, James Madison University, Harrisonburg, Virginia, United States of America
| | - Alexander William Thorman
- Department of Environmental and Public Health Sciences, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Saul Trevino
- Department of Chemistry, Math, and Physics, Houston Baptist University, Houston, Texas, United States of America
| | - Amy Cheng Vollmer
- Department of Biology, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Virginia Walbot
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Joanne Willey
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Deborah A. Siegele
- Department of Biology, Texas A&M University, College Station, Texas, United States of America
| | - James C. Hu
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
- Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
| |
Collapse
|
3
|
Healthcare Applications of Artificial Intelligence and Analytics: A Review and Proposed Framework. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10186553] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Healthcare is considered as one of the most promising application areas for artificial intelligence and analytics (AIA) just after the emergence of the latter. AI combined to analytics technologies is increasingly changing medical practice and healthcare in an impressive way using efficient algorithms from various branches of information technology (IT). Indeed, numerous works are published every year in several universities and innovation centers worldwide, but there are concerns about progress in their effective success. There are growing examples of AIA being implemented in healthcare with promising results. This review paper summarizes the past 5 years of healthcare applications of AIA, across different techniques and medical specialties, and discusses the current issues and challenges, related to this revolutionary technology. A total of 24,782 articles were identified. The aim of this paper is to provide the research community with the necessary background to push this field even further and propose a framework that will help integrate diverse AIA technologies around patient needs in various healthcare contexts, especially for chronic care patients, who present the most complex comorbidities and care needs.
Collapse
|
4
|
Hier DB, Brint SU. A Neuro-ontology for the neurological examination. BMC Med Inform Decis Mak 2020; 20:47. [PMID: 32131804 PMCID: PMC7057564 DOI: 10.1186/s12911-020-1066-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 02/25/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of clinical data in electronic health records for machine-learning or data analytics depends on the conversion of free text into machine-readable codes. We have examined the feasibility of capturing the neurological examination as machine-readable codes based on UMLS Metathesaurus concepts. METHODS We created a target ontology for capturing the neurological examination using 1100 concepts from the UMLS Metathesaurus. We created a dataset of 2386 test-phrases based on 419 published neurological cases. We then mapped the test-phrases to the target ontology. RESULTS We were able to map all of the 2386 test-phrases to 601 unique UMLS concepts. A neurological examination ontology with 1100 concepts has sufficient breadth and depth of coverage to encode all of the neurologic concepts derived from the 419 test cases. Using only pre-coordinated concepts, component ontologies of the UMLS, such as HPO, SNOMED CT, and OMIM, do not have adequate depth and breadth of coverage to encode the complexity of the neurological examination. CONCLUSION An ontology based on a subset of UMLS has sufficient breadth and depth of coverage to convert deficits from the neurological examination into machine-readable codes using pre-coordinated concepts. The use of a small subset of UMLS concepts for a neurological examination ontology offers the advantage of improved manageability as well as the opportunity to curate the hierarchy and subsumption relationships.
Collapse
Affiliation(s)
- Daniel B Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, 912 S. Wood Street (MC 796), Chicago, IL, 60612, USA.
| | - Steven U Brint
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, 912 S. Wood Street (MC 796), Chicago, IL, 60612, USA
| |
Collapse
|
5
|
Radovanović S, Delibašić B, Jovanović M, Vukićević M, Suknović M. A Framework for Integrating Domain Knowledge in Logistic Regression with Application to Hospital Readmission Prediction. INT J ARTIF INTELL T 2019. [DOI: 10.1142/s0218213019600066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
It is commonly understood that machine learning algorithms discover and extract knowledge based on data at hand. However, a huge amount of knowledge is available which is in machine-readable format and ready for inclusion in machine learning algorithms and models. In this paper, we propose a framework that integrates domain knowledge in form of ontologies/hierarchies into logistic regression using stacked generalization. Namely, relations from ontology/hierarchy are used in stacking manner in order to obtain higher, more abstract concepts. Obtained concepts are further used for prediction. The problem we solved is unplanned 30-days hospital readmission, which is considered as one of the major problems in healthcare. Proposed framework yields better results compared to Ridge, Lasso, and Tree Lasso Logistic Regression. Results suggest that the proposed framework improves AUC by up to 9.5% on pediatric datasets and up to 4% on morbidly obese patients’ datasets and also improves AUPRC by up to 5.7% on pediatric datasets and up to 2.6% on morbidly obese patients’ datasets on average. This indicates that the inclusion of domain knowledge improves the predictive performance of Logistic Regression.
Collapse
Affiliation(s)
- Sandro Radovanović
- University of Belgrade, Faculty of Organizational Sciences, Jove Ilića 154, Belgrade, Serbia
| | - Boris Delibašić
- University of Belgrade, Faculty of Organizational Sciences, Jove Ilića 154, Belgrade, Serbia
| | - Miloš Jovanović
- University of Belgrade, Faculty of Organizational Sciences, Jove Ilića 154, Belgrade, Serbia
| | - Milan Vukićević
- University of Belgrade, Faculty of Organizational Sciences, Jove Ilića 154, Belgrade, Serbia
| | - Milija Suknović
- University of Belgrade, Faculty of Organizational Sciences, Jove Ilića 154, Belgrade, Serbia
| |
Collapse
|
6
|
Lalor JP, Woolf B, Yu H. Improving Electronic Health Record Note Comprehension With NoteAid: Randomized Trial of Electronic Health Record Note Comprehension Interventions With Crowdsourced Workers. J Med Internet Res 2019; 21:e10793. [PMID: 30664453 PMCID: PMC6351990 DOI: 10.2196/10793] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 09/28/2018] [Accepted: 10/26/2018] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Patient portals are becoming more common, and with them, the ability of patients to access their personal electronic health records (EHRs). EHRs, in particular the free-text EHR notes, often contain medical jargon and terms that are difficult for laypersons to understand. There are many Web-based resources for learning more about particular diseases or conditions, including systems that directly link to lay definitions or educational materials for medical concepts. OBJECTIVE Our goal is to determine whether use of one such tool, NoteAid, leads to higher EHR note comprehension ability. We use a new EHR note comprehension assessment tool instead of patient self-reported scores. METHODS In this work, we compare a passive, self-service educational resource (MedlinePlus) with an active resource (NoteAid) where definitions are provided to the user for medical concepts that the system identifies. We use Amazon Mechanical Turk (AMT) to recruit individuals to complete ComprehENotes, a new test of EHR note comprehension. RESULTS Mean scores for individuals with access to NoteAid are significantly higher than the mean baseline scores, both for raw scores (P=.008) and estimated ability (P=.02). CONCLUSIONS In our experiments, we show that the active intervention leads to significantly higher scores on the comprehension test as compared with a baseline group with no resources provided. In contrast, there is no significant difference between the group that was provided with the passive intervention and the baseline group. Finally, we analyze the demographics of the individuals who participated in our AMT task and show differences between groups that align with the current understanding of health literacy between populations. This is the first work to show improvements in comprehension using tools such as NoteAid as measured by an EHR note comprehension assessment tool as opposed to patient self-reported scores.
Collapse
Affiliation(s)
- John P Lalor
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, United States
| | - Beverly Woolf
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, United States
| | - Hong Yu
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, United States.,Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States.,Department of Medicine, University of Massachusetts Medical School, Worcester, MA, United States.,Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
| |
Collapse
|
7
|
Quality assurance of biomedical terminologies and ontologies. J Biomed Inform 2018; 86:106-108. [PMID: 30205171 DOI: 10.1016/j.jbi.2018.09.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 09/07/2018] [Indexed: 11/22/2022]
|
8
|
Lossio-Ventura JA, Hogan W, Modave F, Guo Y, He Z, Yang X, Zhang H, Bian J. OC-2-KB: integrating crowdsourcing into an obesity and cancer knowledge base curation system. BMC Med Inform Decis Mak 2018; 18:55. [PMID: 30066655 PMCID: PMC6069686 DOI: 10.1186/s12911-018-0635-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND There is strong scientific evidence linking obesity and overweight to the risk of various cancers and to cancer survivorship. Nevertheless, the existing online information about the relationship between obesity and cancer is poorly organized, not evidenced-based, of poor quality, and confusing to health information consumers. A formal knowledge representation such as a Semantic Web knowledge base (KB) can help better organize and deliver quality health information. We previously presented the OC-2-KB (Obesity and Cancer to Knowledge Base), a software pipeline that can automatically build an obesity and cancer KB from scientific literature. In this work, we investigated crowdsourcing strategies to increase the number of ground truth annotations and improve the quality of the KB. METHODS We developed a new release of the OC-2-KB system addressing key challenges in automatic KB construction. OC-2-KB automatically extracts semantic triples in the form of subject-predicate-object expressions from PubMed abstracts related to the obesity and cancer literature. The accuracy of the facts extracted from scientific literature heavily relies on both the quantity and quality of the available ground truth triples. Thus, we incorporated a crowdsourcing process to improve the quality of the KB. RESULTS We conducted two rounds of crowdsourcing experiments using a new corpus with 82 obesity and cancer-related PubMed abstracts. We demonstrated that crowdsourcing is indeed a low-cost mechanism to collect labeled data from non-expert laypeople. Even though individual layperson might not offer reliable answers, the collective wisdom of the crowd is comparable to expert opinions. We also retrained the relation detection machine learning models in OC-2-KB using the crowd annotated data and evaluated the content of the curated KB with a set of competency questions. Our evaluation showed improved performance of the underlying relation detection model in comparison to the baseline OC-2-KB. CONCLUSIONS We presented a new version of OC-2-KB, a system that automatically builds an evidence-based obesity and cancer KB from scientific literature. Our KB construction framework integrated automatic information extraction with crowdsourcing techniques to verify the extracted knowledge. Our ultimate goal is a paradigm shift in how the general public access, read, digest, and use online health information.
Collapse
Affiliation(s)
- Juan Antonio Lossio-Ventura
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - William Hogan
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - François Modave
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - Yi Guo
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - Zhe He
- School of Information, Florida State University, 142 Collegiate Loop, Tallahassee, FL, 32306, USA
| | - Xi Yang
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - Hansi Zhang
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA
| | - Jiang Bian
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA.
| |
Collapse
|
9
|
Créquit P, Mansouri G, Benchoufi M, Vivot A, Ravaud P. Mapping of Crowdsourcing in Health: Systematic Review. J Med Internet Res 2018; 20:e187. [PMID: 29764795 PMCID: PMC5974463 DOI: 10.2196/jmir.9330] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 02/10/2018] [Accepted: 03/14/2018] [Indexed: 11/22/2022] Open
Abstract
Background Crowdsourcing involves obtaining ideas, needed services, or content by soliciting Web-based contributions from a crowd. The 4 types of crowdsourced tasks (problem solving, data processing, surveillance or monitoring, and surveying) can be applied in the 3 categories of health (promotion, research, and care). Objective This study aimed to map the different applications of crowdsourcing in health to assess the fields of health that are using crowdsourcing and the crowdsourced tasks used. We also describe the logistics of crowdsourcing and the characteristics of crowd workers. Methods MEDLINE, EMBASE, and ClinicalTrials.gov were searched for available reports from inception to March 30, 2016, with no restriction on language or publication status. Results We identified 202 relevant studies that used crowdsourcing, including 9 randomized controlled trials, of which only one had posted results at ClinicalTrials.gov. Crowdsourcing was used in health promotion (91/202, 45.0%), research (73/202, 36.1%), and care (38/202, 18.8%). The 4 most frequent areas of application were public health (67/202, 33.2%), psychiatry (32/202, 15.8%), surgery (22/202, 10.9%), and oncology (14/202, 6.9%). Half of the reports (99/202, 49.0%) referred to data processing, 34.6% (70/202) referred to surveying, 10.4% (21/202) referred to surveillance or monitoring, and 5.9% (12/202) referred to problem-solving. Labor market platforms (eg, Amazon Mechanical Turk) were used in most studies (190/202, 94%). The crowd workers’ characteristics were poorly reported, and crowdsourcing logistics were missing from two-thirds of the reports. When reported, the median size of the crowd was 424 (first and third quartiles: 167-802); crowd workers’ median age was 34 years (32-36). Crowd workers were mainly recruited nationally, particularly in the United States. For many studies (58.9%, 119/202), previous experience in crowdsourcing was required, and passing a qualification test or training was seldom needed (11.9% of studies; 24/202). For half of the studies, monetary incentives were mentioned, with mainly less than US $1 to perform the task. The time needed to perform the task was mostly less than 10 min (58.9% of studies; 119/202). Data quality validation was used in 54/202 studies (26.7%), mainly by attention check questions or by replicating the task with several crowd workers. Conclusions The use of crowdsourcing, which allows access to a large pool of participants as well as saving time in data collection, lowering costs, and speeding up innovations, is increasing in health promotion, research, and care. However, the description of crowdsourcing logistics and crowd workers’ characteristics is frequently missing in study reports and needs to be precisely reported to better interpret the study findings and replicate them.
Collapse
Affiliation(s)
- Perrine Créquit
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France
| | - Ghizlène Mansouri
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France
| | - Mehdi Benchoufi
- Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Alexandre Vivot
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Philippe Ravaud
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France.,Department of Epidemiology, Columbia University, Mailman School of Public Health, New York, NY, United States
| |
Collapse
|
10
|
Lalor JP, Wu H, Chen L, Mazor KM, Yu H. ComprehENotes, an Instrument to Assess Patient Reading Comprehension of Electronic Health Record Notes: Development and Validation. J Med Internet Res 2018; 20:e139. [PMID: 29695372 PMCID: PMC5943623 DOI: 10.2196/jmir.9380] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 02/06/2018] [Accepted: 02/20/2018] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Patient portals are widely adopted in the United States and allow millions of patients access to their electronic health records (EHRs), including their EHR clinical notes. A patient's ability to understand the information in the EHR is dependent on their overall health literacy. Although many tests of health literacy exist, none specifically focuses on EHR note comprehension. OBJECTIVE The aim of this paper was to develop an instrument to assess patients' EHR note comprehension. METHODS We identified 6 common diseases or conditions (heart failure, diabetes, cancer, hypertension, chronic obstructive pulmonary disease, and liver failure) and selected 5 representative EHR notes for each disease or condition. One note that did not contain natural language text was removed. Questions were generated from these notes using Sentence Verification Technique and were analyzed using item response theory (IRT) to identify a set of questions that represent a good test of ability for EHR note comprehension. RESULTS Using Sentence Verification Technique, 154 questions were generated from the 29 EHR notes initially obtained. Of these, 83 were manually selected for inclusion in the Amazon Mechanical Turk crowdsourcing tasks and 55 were ultimately retained following IRT analysis. A follow-up validation with a second Amazon Mechanical Turk task and IRT analysis confirmed that the 55 questions test a latent ability dimension for EHR note comprehension. A short test of 14 items was created along with the 55-item test. CONCLUSIONS We developed ComprehENotes, an instrument for assessing EHR note comprehension from existing EHR notes, gathered responses using crowdsourcing, and used IRT to analyze those responses, thus resulting in a set of questions to measure EHR note comprehension. Crowdsourced responses from Amazon Mechanical Turk can be used to estimate item parameters and select a subset of items for inclusion in the test set using IRT. The final set of questions is the first test of EHR note comprehension.
Collapse
Affiliation(s)
- John P Lalor
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, United States
| | - Hao Wu
- Psychology Department, Boston College, Chestnut Hill, MA, United States
| | - Li Chen
- Psychology Department, Boston College, Chestnut Hill, MA, United States
| | - Kathleen M Mazor
- Meyers Primary Care Institute, University of Massachusetts Medical School / Reliant Medical Group / Fallon Health, Worcester, MA, United States
| | - Hong Yu
- College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, United States.,Department of Computer Science, University of Massachusetts, Lowell, MA, United States.,Department of Medicine, University of Massachusetts Medical School, Worcester, MA, United States.,Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
| |
Collapse
|
11
|
Abstract
Background Crowdsourcing is a nascent phenomenon that has grown exponentially since it was coined in 2006. It involves a large group of people solving a problem or completing a task for an individual or, more commonly, for an organisation. While the field of crowdsourcing has developed more quickly in information technology, it has great promise in health applications. This review examines uses of crowdsourcing in global health and health, broadly. Methods Semantic searches were run in Google Scholar for “crowdsourcing,” “crowdsourcing and health,” and similar terms. 996 articles were retrieved and all abstracts were scanned. 285 articles related to health. This review provides a narrative overview of the articles identified. Results Eight areas where crowdsourcing has been used in health were identified: diagnosis; surveillance; nutrition; public health and environment; education; genetics; psychology; and, general medicine/other. Many studies reported crowdsourcing being used in a diagnostic or surveillance capacity. Crowdsourcing has been widely used across medical disciplines; however, it is important for future work using crowdsourcing to consider the appropriateness of the crowd being used to ensure the crowd is capable and has the adequate knowledge for the task at hand. Gamification of tasks seems to improve accuracy; other innovative methods of analysis including introducing thresholds and measures of trustworthiness should be considered. Conclusion Crowdsourcing is a new field that has been widely used and is innovative and adaptable. With the exception of surveillance applications that are used in emergency and disaster situations, most uses of crowdsourcing have only been used as pilots. These exceptions demonstrate that it is possible to take crowdsourcing applications to scale. Crowdsourcing has the potential to provide more accessible health care to more communities and individuals rapidly and to lower costs of care.
Collapse
Affiliation(s)
- Kerri Wazny
- Centre for Global Health Research, Usher Institute of Informatics and Population Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
12
|
Amith M, He Z, Bian J, Lossio-Ventura JA, Tao C. Assessing the practice of biomedical ontology evaluation: Gaps and opportunities. J Biomed Inform 2018; 80:1-13. [PMID: 29462669 PMCID: PMC5882531 DOI: 10.1016/j.jbi.2018.02.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 02/12/2018] [Accepted: 02/16/2018] [Indexed: 11/26/2022]
Abstract
With the proliferation of heterogeneous health care data in the last three decades, biomedical ontologies and controlled biomedical terminologies play a more and more important role in knowledge representation and management, data integration, natural language processing, as well as decision support for health information systems and biomedical research. Biomedical ontologies and controlled terminologies are intended to assure interoperability. Nevertheless, the quality of biomedical ontologies has hindered their applicability and subsequent adoption in real-world applications. Ontology evaluation is an integral part of ontology development and maintenance. In the biomedicine domain, ontology evaluation is often conducted by third parties as a quality assurance (or auditing) effort that focuses on identifying modeling errors and inconsistencies. In this work, we first organized four categorical schemes of ontology evaluation methods in the existing literature to create an integrated taxonomy. Further, to understand the ontology evaluation practice in the biomedicine domain, we reviewed a sample of 200 ontologies from the National Center for Biomedical Ontology (NCBO) BioPortal-the largest repository for biomedical ontologies-and observed that only 15 of these ontologies have documented evaluation in their corresponding inception papers. We then surveyed the recent quality assurance approaches for biomedical ontologies and their use. We also mapped these quality assurance approaches to the ontology evaluation criteria. It is our anticipation that ontology evaluation and quality assurance approaches will be more widely adopted in the development life cycle of biomedical ontologies.
Collapse
Affiliation(s)
- Muhammad Amith
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Zhe He
- School of Information, Florida State University, Tallahassee, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | | | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
13
|
Staffieri SE, Kearns LS, Sanfilippo PG, Craig JE, Mackey DA, Hewitt AW. Crowd-sourced Ontology for Photoleukocoria: Identifying Common Internet Search Terms for a Potentially Important Pediatric Ophthalmic Sign. Transl Vis Sci Technol 2018; 7:18. [PMID: 29464132 PMCID: PMC5815559 DOI: 10.1167/tvst.7.1.18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 12/14/2017] [Indexed: 12/23/2022] Open
Abstract
Purpose Leukocoria is the most common presenting sign for pediatric eye disease including retinoblastoma and cataract, with worse outcomes if diagnosis is delayed. We investigated whether individuals could identify leukocoria in photographs (photoleukocoria) and examined their subsequent Internet search behavior. Methods Using a web-based questionnaire, in this cross-sectional study we invited adults aged over 18 years to view two photographs of a child with photoleukocoria, and then search the Internet to determine a possible diagnosis and action plan. The most commonly used search terms and websites accessed were recorded. Results The questionnaire was completed by 1639 individuals. Facebook advertisement was the most effective recruitment strategy. The mean age of all respondents was 38.95 ± 14.59 years (range, 18-83), 94% were female, and 59.3% had children. An abnormality in the images presented was identified by 1613 (98.4%) participants. The most commonly used search terms were: "white," "pupil," "photo," and "eye" reaching a variety of appropriate websites or links to print or social media articles. Conclusions Different words or phrases were used to describe the same observation of photoleukocoria leading to a range of websites. Variations in the description of observed signs and search words influenced the sites reached, information obtained, and subsequent help-seeking intentions. Translational Relevance Identifying the most commonly used search terms for photoleukocoria is an important step for search engine optimization. Being directed to the most appropriate websites informing of the significance of photoleukocoria and the appropriate actions to take could improve delays in diagnosis of important pediatric eye disease such as retinoblastoma or cataract.
Collapse
Affiliation(s)
- Sandra E Staffieri
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, VIC, Australia.,Ophthalmology, University of Melbourne, Department of Surgery, Melbourne, VIC, Australia
| | - Lisa S Kearns
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, VIC, Australia.,Ophthalmology, University of Melbourne, Department of Surgery, Melbourne, VIC, Australia
| | - Paul G Sanfilippo
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, VIC, Australia.,Ophthalmology, University of Melbourne, Department of Surgery, Melbourne, VIC, Australia
| | - Jamie E Craig
- Department of Ophthalmology, Flinders University, Flinders Medical Centre, Adelaide, SA, Australia
| | - David A Mackey
- Menzies Institute for Medical Research, School of Medicine, University of Tasmania, Hobart, TAS, Australia.,Lion's Eye Institute, Centre for Ophthalmology and Visual Sciences, University of Western Australia, Perth, WA, Australia
| | - Alex W Hewitt
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, VIC, Australia.,Ophthalmology, University of Melbourne, Department of Surgery, Melbourne, VIC, Australia.,Menzies Institute for Medical Research, School of Medicine, University of Tasmania, Hobart, TAS, Australia
| |
Collapse
|
14
|
OmniPHR: A distributed architecture model to integrate personal health records. J Biomed Inform 2017; 71:70-81. [PMID: 28545835 DOI: 10.1016/j.jbi.2017.05.012] [Citation(s) in RCA: 201] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Revised: 04/07/2017] [Accepted: 05/15/2017] [Indexed: 02/08/2023]
Abstract
The advances in the Information and Communications Technology (ICT) brought many benefits to the healthcare area, specially to digital storage of patients' health records. However, it is still a challenge to have a unified viewpoint of patients' health history, because typically health data is scattered among different health organizations. Furthermore, there are several standards for these records, some of them open and others proprietary. Usually health records are stored in databases within health organizations and rarely have external access. This situation applies mainly to cases where patients' data are maintained by healthcare providers, known as EHRs (Electronic Health Records). In case of PHRs (Personal Health Records), in which patients by definition can manage their health records, they usually have no control over their data stored in healthcare providers' databases. Thereby, we envision two main challenges regarding PHR context: first, how patients could have a unified view of their scattered health records, and second, how healthcare providers can access up-to-date data regarding their patients, even though changes occurred elsewhere. For addressing these issues, this work proposes a model named OmniPHR, a distributed model to integrate PHRs, for patients and healthcare providers use. The scientific contribution is to propose an architecture model to support a distributed PHR, where patients can maintain their health history in an unified viewpoint, from any device anywhere. Likewise, for healthcare providers, the possibility of having their patients data interconnected among health organizations. The evaluation demonstrates the feasibility of the model in maintaining health records distributed in an architecture model that promotes a unified view of PHR with elasticity and scalability of the solution.
Collapse
|
15
|
Abstract
The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.
Collapse
|
16
|
Ochs C, Case JT, Perl Y. Analyzing structural changes in SNOMED CT's Bacterial infectious diseases using a visual semantic delta. J Biomed Inform 2017; 67:101-116. [PMID: 28215561 DOI: 10.1016/j.jbi.2017.02.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 02/08/2017] [Accepted: 02/09/2017] [Indexed: 12/23/2022]
Abstract
Thousands of changes are applied to SNOMED CT's concepts during each release cycle. These changes are the result of efforts to improve or expand the coverage of health domains in the terminology. Understanding which concepts changed, how they changed, and the overall impact of a set of changes is important for editors and end users. Each SNOMED CT release comes with delta files, which identify all of the individual additions and removals of concepts and relationships. These files typically contain tens of thousands of individual entries, overwhelming users. They also do not identify the editorial processes that were applied to individual concepts and they do not capture the overall impact of a set of changes on a subhierarchy of concepts. In this paper we introduce a methodology and accompanying software tool called a SNOMED CT Visual Semantic Delta ("semantic delta" for short) to enable a comprehensive review of changes in SNOMED CT. The semantic delta displays a graphical list of editing operations that provides semantics and context to the additions and removals in the delta files. However, there may still be thousands of editing operations applied to a set of concepts. To address this issue, a semantic delta includes a visual summary of changes that affected sets of structurally and semantically similar concepts. The software tool for creating semantic deltas offers views of various granularities, allowing a user to control how much change information they view. In this tool a user can select a set of structurally and semantically similar concepts and review the editing operations that affected their modeling. The semantic delta methodology is demonstrated on SNOMED CT's Bacterial infectious disease subhierarchy, which has undergone a significant remodeling effort over the last two years.
Collapse
Affiliation(s)
- Christopher Ochs
- Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA.
| | - James T Case
- National Library of Medicine/National Institutes of Health, Bethesda, MD 20894, USA
| | - Yehoshua Perl
- Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| |
Collapse
|
17
|
Ochs C, Case JT, Perl Y. Tracking the Remodeling of SNOMED CT's Bacterial Infectious Diseases. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:974-983. [PMID: 28269894 PMCID: PMC5333319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
SNOMED CT's content undergoes many changes from one release to the next. Over the last year SNOMED CT's Bacterial infectious disease subhierarchy has undergone significant editing to bring consistent modeling to its concepts. In this paper we analyze the stated and inferred structural modifications that affected the Bacterial infectious disease subhierarchy between the Jan 2015 and Jan 2016 SNOMED CT releases using a two-phased approach. First, we introduce a methodology for creating a human readable list of changes. Next, we utilize partial-area taxonomies, which are compact summaries of SNOMED CT's content and structure, to identify the "big picture" changes that occurred in the subhierarchy. We illustrate how partial-area taxonomies can be used to help identify groups of concepts that were affected by these editing operations and the nature of these changes. Modeling issues identified using our two-phase methodology are discussed.
Collapse
|
18
|
Soualmia LF, Charlet J. Efficient Results in Semantic Interoperability for Health Care. Findings from the Section on Knowledge Representation and Management. Yearb Med Inform 2016:184-187. [PMID: 27830249 DOI: 10.15265/iy-2016-051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVES To summarize excellent current research in the field of Knowledge Representation and Management (KRM) within the health and medical care domain. METHOD We provide a synopsis of the 2016 IMIA selected articles as well as a related synthetic overview of the current and future field activities. A first step of the selection was performed through MEDLINE querying with a list of MeSH descriptors completed by a list of terms adapted to the KRM section. The second step of the selection was completed by the two section editors who separately evaluated the set of 1,432 articles. The third step of the selection consisted of a collective work that merged the evaluation results to retain 15 articles for peer-review. RESULTS The selection and evaluation process of this Yearbook's section on Knowledge Representation and Management has yielded four excellent and interesting articles regarding semantic interoperability for health care by gathering heterogeneous sources (knowledge and data) and auditing ontologies. In the first article, the authors present a solution based on standards and Semantic Web technologies to access distributed and heterogeneous datasets in the domain of breast cancer clinical trials. The second article describes a knowledge-based recommendation system that relies on ontologies and Semantic Web rules in the context of chronic diseases dietary. The third article is related to concept-recognition and text-mining to derive common human diseases model and a phenotypic network of common diseases. In the fourth article, the authors highlight the need for auditing the SNOMED CT. They propose to use a crowdbased method for ontology engineering. CONCLUSIONS The current research activities further illustrate the continuous convergence of Knowledge Representation and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care by proposing solutions to cope with the problem of semantic interoperability. Indeed, there is a need for powerful tools able to manage and interpret complex, large-scale and distributed datasets and knowledge bases, but also a need for user-friendly tools developed for the clinicians in their daily practice.
Collapse
Affiliation(s)
- L F Soualmia
- Dr Lina F. Soualmia, Normandie Universités, Rouen University and Hospital, D2IM, LITIS EA 4108, Information Processing in Biology & Health, 1, rue de Germont, Cour Leschevin porte 21, 76031 Rouen Cedex, France, Tel : +33 232 885 869, E-mail:
| | | |
Collapse
|
19
|
Homer ML, Palmer NP, Bodenreider O, Cami A, Chadwick L, Mandl KD. The Drug Data to Knowledge Pipeline: Large-Scale Claims Data Classification for Pharmacologic Insight. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:105-11. [PMID: 27570659 PMCID: PMC5001754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
In biomedical informatics, assigning drug codes to categories is a common step in the analysis pipeline. Unfortunately, incomplete mappings are the norm rather than the exception with coverage values less than 85% not uncommon. Here, we perform this linking task on a nationwide insurance claims database with over 13 million members who were dispensed, according to National Drug Codes (NDCs), over 50,000 unique product forms of medication. The chosen approach employs Cerner Multum's VantageRx and the U.S. National Library of Medicine's RxMix. As a result, 94.0% of the NDCs were successfully mapped to categories used by common drug terminologies, e.g., Anatomical Therapeutic Chemical (ATC). Implemented as an SQL database and scripts, the approach is generic and can be setup for a new data set in a few hours. Thus, the method is a viable option for large-scale drug classification.
Collapse
Affiliation(s)
- Mark L. Homer
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA;,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Nathan P. Palmer
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA;,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Olivier Bodenreider
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Aurel Cami
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA;,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Laura Chadwick
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA;,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA;,MCPHS University, Boston, MA, USA
| | - Kenneth D. Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA;,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
20
|
Zhitomirsky-Geffet M, Bar-Ilan J, Levene M. Testing the stability of “wisdom of crowds” judgments of search results over time and their similarity with the search engine rankings. ASLIB J INFORM MANAG 2016. [DOI: 10.1108/ajim-10-2015-0165] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Purpose
– One of the under-explored aspects in the process of user information seeking behaviour is influence of time on relevance evaluation. It has been shown in previous studies that individual users might change their assessment of search results over time. It is also known that aggregated judgements of multiple individual users can lead to correct and reliable decisions; this phenomenon is known as the “wisdom of crowds”. The purpose of this paper is to examine whether aggregated judgements will be more stable and thus more reliable over time than individual user judgements.
Design/methodology/approach
– In this study two simple measures are proposed to calculate the aggregated judgements of search results and compare their reliability and stability to individual user judgements. In addition, the aggregated “wisdom of crowds” judgements were used as a means to compare the differences between human assessments of search results and search engine’s rankings. A large-scale user study was conducted with 87 participants who evaluated two different queries and four diverse result sets twice, with an interval of two months. Two types of judgements were considered in this study: relevance on a four-point scale, and ranking on a ten-point scale without ties.
Findings
– It was found that aggregated judgements are much more stable than individual user judgements, yet they are quite different from search engine rankings.
Practical implications
– The proposed “wisdom of crowds”-based approach provides a reliable reference point for the evaluation of search engines. This is also important for exploring the need of personalisation and adapting search engine’s ranking over time to changes in users preferences.
Originality/value
– This is a first study that applies the notion of “wisdom of crowds” to examine an under-explored in the literature phenomenon of “change in time” in user evaluation of relevance.
Collapse
|
21
|
Ochs C, Geller J, Perl Y, Musen MA. A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies. J Biomed Inform 2016; 62:90-105. [PMID: 27345947 DOI: 10.1016/j.jbi.2016.06.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 06/02/2016] [Accepted: 06/22/2016] [Indexed: 11/27/2022]
Abstract
Software tools play a critical role in the development and maintenance of biomedical ontologies. One important task that is difficult without software tools is ontology quality assurance. In previous work, we have introduced different kinds of abstraction networks to provide a theoretical foundation for ontology quality assurance tools. Abstraction networks summarize the structure and content of ontologies. One kind of abstraction network that we have used repeatedly to support ontology quality assurance is the partial-area taxonomy. It summarizes structurally and semantically similar concepts within an ontology. However, the use of partial-area taxonomies was ad hoc and not generalizable. In this paper, we describe the Ontology Abstraction Framework (OAF), a unified framework and software system for deriving, visualizing, and exploring partial-area taxonomy abstraction networks. The OAF includes support for various ontology representations (e.g., OWL and SNOMED CT's relational format). A Protégé plugin for deriving "live partial-area taxonomies" is demonstrated.
Collapse
Affiliation(s)
- Christopher Ochs
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA.
| | - James Geller
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Yehoshua Perl
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Mark A Musen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
22
|
Zhitomirsky-Geffet M, Erez ES, Judit BI. Toward multiviewpoint ontology construction by collaboration of non-experts and crowdsourcing: The case of the effect of diet on health. J Assoc Inf Sci Technol 2016. [DOI: 10.1002/asi.23686] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
| | - Eden S. Erez
- Department of Computer Science; Bar-Ilan University; Ramat-Gan 5290002 Israel
| | - Bar-Ilan Judit
- Department of Information Science; Bar-Ilan University; Ramat-Gan 5290002 Israel
| |
Collapse
|
23
|
Li TS, Bravo À, Furlong LI, Good BM, Su AI. A crowdsourcing workflow for extracting chemical-induced disease relations from free text. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw051. [PMID: 27087308 PMCID: PMC4834205 DOI: 10.1093/database/baw051] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2015] [Accepted: 03/17/2016] [Indexed: 01/05/2023]
Abstract
Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd_cid_relex Database URL: https://github.com/SuLab/crowd_cid_relex
Collapse
Affiliation(s)
- Tong Shu Li
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| | - Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
| | - Benjamin M Good
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew I Su
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
24
|
Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies. J Biomed Inform 2016; 61:63-76. [PMID: 26988001 DOI: 10.1016/j.jbi.2016.03.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 02/05/2016] [Accepted: 03/04/2016] [Indexed: 11/22/2022]
Abstract
An Abstraction Network is a compact summary of an ontology's structure and content. In previous research, we showed that Abstraction Networks support quality assurance (QA) of biomedical ontologies. The development of an Abstraction Network and its associated QA methodologies, however, is a labor-intensive process that previously was applicable only to one ontology at a time. To improve the efficiency of the Abstraction-Network-based QA methodology, we introduced a QA framework that uses uniform Abstraction Network derivation techniques and QA methodologies that are applicable to whole families of structurally similar ontologies. For the family-based framework to be successful, it is necessary to develop a method for classifying ontologies into structurally similar families. We now describe a structural meta-ontology that classifies ontologies according to certain structural features that are commonly used in the modeling of ontologies (e.g., object properties) and that are important for Abstraction Network derivation. Each class of the structural meta-ontology represents a family of ontologies with identical structural features, indicating which types of Abstraction Networks and QA methodologies are potentially applicable to all of the ontologies in the family. We derive a collection of 81 families, corresponding to classes of the structural meta-ontology, that enable a flexible, streamlined family-based QA methodology, offering multiple choices for classifying an ontology. The structure of 373 ontologies from the NCBO BioPortal is analyzed and each ontology is classified into multiple families modeled by the structural meta-ontology.
Collapse
|
25
|
Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology. J Biomed Inform 2016; 60:199-209. [PMID: 26873781 DOI: 10.1016/j.jbi.2016.02.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 12/06/2015] [Accepted: 02/03/2016] [Indexed: 11/20/2022]
Abstract
Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.
Collapse
|
26
|
Cui L. COHeRE: Cross-Ontology Hierarchical Relation Examination for Ontology Quality Assurance. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:456-465. [PMID: 26958178 PMCID: PMC4765676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Biomedical ontologies play a vital role in healthcare information management, data integration, and decision support. Ontology quality assurance (OQA) is an indispensable part of the ontology engineering cycle. Most existing OQA methods are based on the knowledge provided within the targeted ontology. This paper proposes a novel cross-ontology analysis method, Cross-Ontology Hierarchical Relation Examination (COHeRE), to detect inconsistencies and possible errors in hierarchical relations across multiple ontologies. COHeRE leverages the Unified Medical Language System (UMLS) knowledge source and the MapReduce cloud computing technique for systematic, large-scale ontology quality assurance work. COHeRE consists of three main steps with the UMLS concepts and relations as the input. First, the relations claimed in source vocabularies are filtered and aggregated for each pair of concepts. Second, inconsistent relations are detected if a concept pair is related by different types of relations in different source vocabularies. Finally, the uncovered inconsistent relations are voted according to their number of occurrences across different source vocabularies. The voting result together with the inconsistent relations serve as the output of COHeRE for possible ontological change. The highest votes provide initial suggestion on how such inconsistencies might be fixed. In UMLS, 138,987 concept pairs were found to have inconsistent relationships across multiple source vocabularies. 40 inconsistent concept pairs involving hierarchical relationships were randomly selected and manually reviewed by a human expert. 95.8% of the inconsistent relations involved in these concept pairs indeed exist in their source vocabularies rather than being introduced by mistake in the UMLS integration process. 73.7% of the concept pairs with suggested relationship were agreed by the human expert. The effectiveness of COHeRE indicates that UMLS provides a promising environment to enhance qualities of biomedical ontologies by performing cross-ontology examination.
Collapse
Affiliation(s)
- Licong Cui
- Department of EECS, Case Western Reserve University, Cleveland, OH Division of Medical Informatics, Case Western Reserve University, Cleveland, OH
| |
Collapse
|
27
|
Da Silveira M, Dos Reis JC, Pruski C. Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges. Yearb Med Inform 2015; 10:125-33. [PMID: 26293859 DOI: 10.15265/iy-2015-002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVES Controlled terminologies and their dependent artefacts provide a consensual understanding of a domain while reducing ambiguities and enabling reasoning. However, the evolution of a domain's knowledge directly impacts these terminologies and generates inconsistencies in the underlying biomedical information systems. In this article, we review existing work addressing the dynamic aspect of terminologies as well as their effects on mappings and semantic annotations. METHODS We investigate approaches related to the identification, characterization and propagation of changes in terminologies, mappings and semantic annotations including techniques to update their content. RESULTS AND CONCLUSION Based on the explored issues and existing methods, we outline open research challenges requiring investigation in the near future.
Collapse
Affiliation(s)
- M Da Silveira
- Dr. Marcos Da Silveira, Luxembourg Institute of Science and Technology (LIST), 5, avenue des Hauts-Fourneaux, 4362 Esch/Alzette, Luxembourg, E-mail:
| | | | | |
Collapse
|
28
|
|
29
|
Gu H, Chen Y, He Z, Halper M, Chen L. Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies. Methods Inf Med 2015; 55:158-65. [PMID: 25925776 DOI: 10.3414/me14-01-0104] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 03/25/2015] [Indexed: 11/09/2022]
Abstract
BACKGROUND The Unified Medical Language System (UMLS) is one of the largest biomedical terminological systems, with over 2.5 million concepts in its Metathesaurus repository. The UMLS's Semantic Network (SN) with its collection of 133 high-level semantic types serves as an abstraction layer on top of the Metathesaurus. In particular, the SN elaborates an aspect of the Metathesaurus's concepts via the assignment of one or more types to each concept. Due to the scope and complexity of the Metathesaurus, errors are all but inevitable in this semantic-type assignment process. OBJECTIVES To develop a semi-automated methodology to help assure the quality of semantic-type assignments within the UMLS. METHODS The methodology uses a cross-validation strategy involving SNOMED CT's hierarchies in combination with UMLS semantic types. Semantically uniform, disjoint concept groups are generated programmatically by partitioning the collection of all concepts in the same SNOMED CT hierarchy according to their respective semantic-type assignments in the UMLS. Domain experts are then called upon to review the concepts in any group having a small number of concepts. It is our hypothesis that a semantic-type assignment combination applicable only to a very small number of concepts in a SNOMED CT hierarchy is an indicator of potential problems. RESULTS The methodology was applied to the UMLS 2013AA release along with the SNOMED CT from January 2013. An overall error rate of 33% was found for concepts proposed by the quality-assurance methodology. Supporting our hypothesis, that number was four times higher than the error rate found in control samples. CONCLUSION The results show that the quality-assurance methodology can aid in effective and efficient identification of UMLS semantic-type assignment errors.
Collapse
Affiliation(s)
- H Gu
- Dr. Huanying (Helen) Gu, Computer Science Department, New York Institute of Technology, 1855 Broadway New York, NY 10023-7692, USA, E-mail:
| | | | | | | | | |
Collapse
|
30
|
Khare R, Good BM, Leaman R, Su AI, Lu Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief Bioinform 2015; 17:23-32. [PMID: 25888696 DOI: 10.1093/bib/bbv021] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.
Collapse
|