1
|
Sedlakova J, Stanikić M, Gille F, Bernard J, Horn AB, Wolf M, Haag C, Floris J, Morgenshtern G, Schneider G, Zumbrunn Wojczyńska A, Mouton Dorey C, Ettlin DA, Gero D, Friemel T, Lu Z, Papadopoulos K, Schläpfer S, Wang N, von Wyl V. Refining Established Practices for Research Question Definition to Foster Interdisciplinary Research Skills in a Digital Age: Consensus Study With Nominal Group Technique. JMIR MEDICAL EDUCATION 2025; 11:e56369. [PMID: 39847774 PMCID: PMC11803332 DOI: 10.2196/56369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 05/16/2024] [Accepted: 11/23/2024] [Indexed: 01/25/2025]
Abstract
BACKGROUND The increased use of digital data in health research demands interdisciplinary collaborations to address its methodological complexities and challenges. This often entails merging the linear deductive approach of health research with the explorative iterative approach of data science. However, there is a lack of structured teaching courses and guidance on how to effectively and constructively bridge different disciplines and research approaches. OBJECTIVE This study aimed to provide a set of tools and recommendations designed to facilitate interdisciplinary education and collaboration. Target groups are lecturers who can use these tools to design interdisciplinary courses, supervisors who guide PhD and master's students in their interdisciplinary projects, and principal investigators who design and organize workshops to initiate and guide interdisciplinary projects. METHODS Our study was conducted in 3 steps: (1) developing a common terminology, (2) identifying established workflows for research question formulation, and (3) examining adaptations of existing study workflows combining methods from health research and data science. We also formulated recommendations for a pragmatic implementation of our findings. We conducted a literature search and organized 3 interdisciplinary expert workshops with researchers at the University of Zurich. For the workshops and the subsequent manuscript writing process, we adopted a consensus study methodology. RESULTS We developed a set of tools to facilitate interdisciplinary education and collaboration. These tools focused on 2 key dimensions- content and curriculum and methods and teaching style-and can be applied in various educational and research settings. We developed a glossary to establish a shared understanding of common terminologies and concepts. We delineated the established study workflow for research question formulation, emphasizing the "what" and the "how," while summarizing the necessary tools to facilitate the process. We propose 3 clusters of contextual and methodological adaptations to this workflow to better integrate data science practices: (1) acknowledging real-life constraints and limitations in research scope; (2) allowing more iterative, data-driven approaches to research question formulation; and (3) strengthening research quality through reproducibility principles and adherence to the findable, accessible, interoperable, and reusable (FAIR) data principles. CONCLUSIONS Research question formulation remains a relevant and useful research step in projects using digital data. We recommend initiating new interdisciplinary collaborations by establishing terminologies as well as using the concepts of research tasks to foster a shared understanding. Our tools and recommendations can support academic educators in training health professionals and researchers for interdisciplinary digital health projects.
Collapse
Affiliation(s)
- Jana Sedlakova
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Mina Stanikić
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Felix Gille
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Jürgen Bernard
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Informatics, Faculty of Business, Economics and Informatics, University of Zurich, Zurich, Switzerland
| | - Andrea B Horn
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Center for Gerontology, University of Zurich, Zurich, Switzerland
- Department of Psychology, Faculty of Arts and Social Sciences, University of Zurich, Zurich, Switzerland
| | - Markus Wolf
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Psychology, Faculty of Arts and Social Sciences, University of Zurich, Zurich, Switzerland
| | - Christina Haag
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Joel Floris
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Evolutionary Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Gabriela Morgenshtern
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Informatics, Faculty of Business, Economics and Informatics, University of Zurich, Zurich, Switzerland
| | - Gerold Schneider
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Computational Linguistics, Faculty of Business, Economics and Informatics, University of Zurich, Zurich, Switzerland
| | - Aleksandra Zumbrunn Wojczyńska
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Center of Dental Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Corine Mouton Dorey
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Biomedical Ethics and History of Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Dominik Alois Ettlin
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Center of Dental Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Daniel Gero
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Surgery and Transplantation, University Hospital of Zurich, Zurich, Switzerland
| | - Thomas Friemel
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Communication and Media Research, Faculty of Arts and Social Sciences, University of Zurich, Zurich, Switzerland
| | - Ziyuan Lu
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Evolutionary Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Kimon Papadopoulos
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculy of Medicine, University of Zurich, Zurich, Switzerland
| | - Sonja Schläpfer
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Complementary and Integrative Medicine, University Hospital of Zurich, Zurich, Switzerland
| | - Ning Wang
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
| | - Viktor von Wyl
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| |
Collapse
|
3
|
Xiong K, Xu X, Fu S, Weng D, Wang Y, Wu Y. JsonCurer: Data Quality Management for JSON Based on an Aggregated Schema. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3008-3021. [PMID: 38625779 DOI: 10.1109/tvcg.2024.3388556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
High-quality data is critical to deriving useful and reliable information. However, real-world data often contains quality issues undermining the value of the derived information. Most existing research on data quality management focuses on tabular data, leaving semi-structured data under-exploited. Due to the schema-less and hierarchical features of semi-structured data, discovering and fixing quality issues is challenging and time-consuming. To address the challenge, this paper presents JsonCurer, an interactive visualization system to assist with data quality management in the context of JSON data. To have an overview of quality issues, we first construct a taxonomy based on interviews with data practitioners and a review of 119 real-world JSON files. Then we highlight a schema visualization that presents structural information, statistical features, and quality issues of JSON data. Based on a similarity-based aggregation technique, the visualization depicts the entire JSON data with a concise tree, where summary visualizations are given above each node, and quality issues are illustrated using Bubble Sets across nodes. We evaluate the effectiveness and usability of JsonCurer with two case studies. One is in the domain of data analysis while the other concerns quality assurance in MongoDB documents.
Collapse
|
5
|
Ruddle RA, Adnan M, Hall M. Using set visualisation to find and explain patterns of missing values: a case study with NHS hospital episode statistics data. BMJ Open 2022; 12:e064887. [PMID: 36410820 PMCID: PMC9680176 DOI: 10.1136/bmjopen-2022-064887] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES Missing data is the most common data quality issue in electronic health records (EHRs). Missing data checks implemented in common analytical software are typically limited to counting the number of missing values in individual fields, but researchers and organisations also need to understand multifield missing data patterns to better inform advanced missing data strategies for which counts or numerical summaries are poorly suited. This study shows how set-based visualisation enables multifield missing data patterns to be discovered and investigated. DESIGN Development and evaluation of interactive set visualisation techniques to find patterns of missing data and generate actionable insights. The visualisations comprised easily interpretable bar charts for sets, heatmaps for set intersections and histograms for distributions of both sets and intersections. SETTING AND PARTICIPANTS Anonymised admitted patient care health records for National Health Service (NHS) hospitals and independent sector providers in England. The visualisation and data mining software was run over 16 million records and 86 fields in the dataset. RESULTS The dataset contained 960 million missing values. Set visualisation bar charts showed how those values were distributed across the fields, including several fields that, unexpectedly, were not complete. Set intersection heatmaps revealed unexpected gaps in diagnosis, operation and date fields because diagnosis and operation fields were not filled up sequentially and some operations did not have corresponding dates. Information gain ratio and entropy calculations allowed us to identify the origin of each unexpected pattern, in terms of the values of other fields. CONCLUSIONS Our findings show how set visualisation reveals important insights about multifield missing data patterns in large EHR datasets. The study revealed both rare and widespread data quality issues that were previously unknown, and allowed a particular part of a specific hospital to be pinpointed as the origin of rare issues that NHS Digital did not know exist.
Collapse
Affiliation(s)
- Roy A Ruddle
- School of Computing and Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
| | - Muhammad Adnan
- Computer Science, Higher Colleges of Technology, Sharjah, UAE
| | - Marlous Hall
- Leeds Institute of Cardiovascular & Metabolic Medicine and Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
| |
Collapse
|
11
|
Rosen P, Quadri GJ. LineSmooth: An Analytical Framework for Evaluating the Effectiveness of Smoothing Techniques on Line Charts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1536-1546. [PMID: 33048725 DOI: 10.1109/tvcg.2020.3030421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We present a comprehensive framework for evaluating line chart smoothing methods under a variety of visual analytics tasks. Line charts are commonly used to visualize a series of data samples. When the number of samples is large, or the data are noisy, smoothing can be applied to make the signal more apparent. However, there are a wide variety of smoothing techniques available, and the effectiveness of each depends upon both nature of the data and the visual analytics task at hand. To date, the visualization community lacks a summary work for analyzing and classifying the various smoothing methods available. In this paper, we establish a framework, based on 8 measures of the line smoothing effectiveness tied to 8 low-level visual analytics tasks. We then analyze 12 methods coming from 4 commonly used classes of line chart smoothing-rank filters, convolutional filters, frequency domain filters, and subsampling. The results show that while no method is ideal for all situations, certain methods, such as Gaussian filters and TOPOLOGY-based subsampling, perform well in general. Other methods, such as low-pass CUTOFF filters and Douglas-peucker subsampling, perform well for specific visual analytics tasks. Almost as importantly, our framework demonstrates that several methods, including the commonly used UNIFORM subsampling, produce low-quality results, and should, therefore, be avoided, if possible.
Collapse
|