Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

30
(from Reference Citation Analysis)

Article PDFs (11)

Cited by > 0 (22)

Searched Name

Data validation

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

van Ditshuizen JC, van Voorden TAJ, Haddo N, Sewalt CA, Den Hartog D, Van Lieshout EMM, Verhofstad MHJ. Missing patient registrations in the Dutch National Trauma Registry of Southwest Netherlands: Prevalence and epidemiology. Int J Med Inform 2024;186:105437. [PMID: 38552267 DOI: 10.1016/j.ijmedinf.2024.105437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 03/18/2024] [Accepted: 03/23/2024] [Indexed: 04/22/2024]

Abstract

INTRODUCTION

Health care patient records have been digitalised the past twenty years, and registries have been automated. Missing registrations are common, and can result in selection bias.

OBJECTIVE

To assess the prevalence and characteristics of missed registrations in a Dutch regional trauma registry.

METHODS

An automatically generated trauma registry export was done for ten out of eleven hospitals in trauma region Southwest Netherlands, between June 1 and August 31, 2020. Second, lists were checked for being falsely flagged as 'non-trauma'. Finally, a list was generated with trauma tick box flagged as 'trauma' but were not automatically in the export due to administrative errors. Automated and missed registration datasets were compared on patient characteristics and logistic regression models were run with random intercepts and missed registration as outcome variable on the complete dataset.

RESULTS

A total of 2,230 automated registrations and 175 (7.3 %) missed registrations were included for the Dutch National Trauma Registry, ranging from 1 to 14 % between participating hospitals. Patients of the missed registration dataset had characteristics of a higher level of care, compared with patients of automated registrations. Level of trauma care (level II OR 0.464 95 % CI 0.328-0.666, p < 0.001; level III OR 0.179 95 % CI 0.092-0.325, p < 0.001), major trauma (OR 2.928 95 % CI 1.792-4.65, p < 0.001), ICU admission (OR 2.337 95 % CI 1.792-4.650, p < 0.001), and surgery (OR 1.871 95 % CI 1.371-2.570, p < 0.001) were potential predictors for missed registrations in multivariate logistic regression analysis.

CONCLUSION

Missed registrations occur frequently and the rate of missed registrations differs greatly between hospitals. Automated and missed registration datasets display differences related to patients requiring more intensive care, which held for the major trauma subset. Checking for missed registrations is time consuming, automated registration lists need a human touch for validation and to be complete.

Collapse

Rodriguez C, Oppenheimer DM. Creating a Bot-tleneck for malicious AI: Psychological methods for bot detection. Behav Res Methods 2024:10.3758/s13428-024-02357-9. [PMID: 38561551 DOI: 10.3758/s13428-024-02357-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2024] [Indexed: 04/04/2024]

Ajami Yazdi A, Ebrahimian Pirbazari A, Esmaeili Khalil Saraei F, Esmaeili A, Ebrahimian Pirbazari A, Akbari Kohnehsari A, Derakhshesh A. Design of 2D/2D β-Ni(OH)₂/ZnO heterostructures via photocatalytic deposition of nickel for sonophotocatalytic degradation of tetracycline and modeling with three supervised machine learning algorithms. Chemosphere 2024;352:141328. [PMID: 38296215 DOI: 10.1016/j.chemosphere.2024.141328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/29/2023] [Accepted: 01/27/2024] [Indexed: 03/10/2024]

Abstract

Due to the expansive use of tetracycline antibiotics (TCs) to treat various infectious diseases in humans and animals, their presence in the environment has created many challenges for human societies. Therefore, providing green and cost-effective solutions for their effective removal has become an urgent need. Here, we will introduce 2D/2D p-n heterostructures that exhibit excellent sonophotocatalytic/photocatalytic properties for water-soluble pollutant removal. In this contribution, for the first time, β- Ni(OH)2 nanosheets were synthesized through visible-light-induced photodeposition of different amounts of nickel on ZnO nanosheets (β-Ni(x)/ZNs) to fabricate 2D/2D p-n heterostructures. The PXRD patterns confirmed the formation of wurtzite phase for ZNs and the hexagonal crystal structure of β-Ni(OH)2. The FESEM and TEM micrographs showed that the β-Ni(OH)2 sheets were dispersed on the surface of ZNs and formed 2D/2D p-n heterojunction in β-Ni(x)/ZNs samples. With the photodeposition of β-Ni(OH)2 nanosheets on ZNs, the surface area, pore volume, and pore diameter of β-Ni(x)/ZNs heterostructures have increased compared to ZNs, which can have a positive effect on the sonophotocatalytic/photocatalytic performance of ZNs. The degradation experiments showed that β-Ni(0.1)/ZNs and β-Ni(0.4)/ZNs have the highest degradation percentage in photocatalytic (51 %) and sonophotocatalytic (71 %) degradation of TC, respectively. Finally, the sonophotocatalytic/photocatalytic degradation process of TC was systematically validated through modeling with three powerful and supervised machine learning algorithms, including Support Vector Regression (SVR), Artificial Neural Networks (ANNs), and Stochastic Gradient Boosting (SGB). Five statistical criteria including R2, SAE, MSE, SSE, and RMSE were calculated for model validation. It was observed that the developed SGB algorithm was the most reliable model for predicting the degradation percent of TC. The results revealed that using fabricated 2D/2D p-n heterojunctions (β-Ni(x)/ZNs) is more sustainable than the conventional ZnO photocatalytic systems in practical applications.

Collapse

McBurney SH, Kwong JC, Brown KA, Rudzicz F, Chen B, Candido E, Crowcroft NS. Validating pertussis data measures using electronic medical record data in Ontario, Canada 1986-2016. Vaccine X 2023;15:100408. [PMID: 38161988 PMCID: PMC10755117 DOI: 10.1016/j.jvacx.2023.100408] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 09/14/2023] [Accepted: 11/13/2023] [Indexed: 01/03/2024] Open

Jiang N, Akter R, Ross G, White S, Kirkwood J, Gunashanhar G, Thompson S, Riley M, Azzi M. On thresholds for controlling negative particle (PM_2.5) readings in air quality reporting. Environ Monit Assess 2023;195:1187. [PMID: 37698727 PMCID: PMC10497433 DOI: 10.1007/s10661-023-11750-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 08/18/2023] [Indexed: 09/13/2023]

Olona C, Pereira-Rodríguez JA, Comas J, Villalobos R, Alonso V, Amador S, Bombuy E, Mitru C, Gimeno M, López-Cano M. Data quality validation of the Spanish Incisional Hernia Surgery Registry (EVEREG): pilot study. Hernia 2023;27:665-670. [PMID: 36964455 DOI: 10.1007/s10029-023-02782-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/14/2023] [Indexed: 03/26/2023]

Nicholson NC, Giusti F, Bettio M, Carvalho RN, Dimitrova N, Dyba T, Flego M, Neamtiu L, Randi G, Martos C. A multipurpose TNM stage ontology for cancer registries. J Biomed Semantics 2022;13:7. [PMID: 35193690 PMCID: PMC8862240 DOI: 10.1186/s13326-022-00260-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 01/19/2022] [Indexed: 11/25/2022] Open

Abstract

Background

Population-based cancer registries are a critical reference source for the surveillance and control of cancer. Cancer registries work extensively with the internationally recognised TNM classification system used to stage solid tumours, but the system is complex and compounded by the different TNM editions in concurrent use. TNM ontologies exist but the design requirements are different for the needs of the clinical and cancer-registry domains. Two TNM ontologies developed specifically for cancer registries were designed for different purposes and have limitations for serving wider application. A unified ontology is proposed to serve the various cancer registry TNM-related tasks and reduce the multiplication effects of different ontologies serving specific tasks. The ontology is comprehensive of the rules for TNM edition 7 as required by cancer registries and designed on a modular basis to allow extension to other TNM editions.

Results

A unified ontology was developed building on the experience and design of the existing ontologies. It follows a modular approach allowing plug in of components dependent upon any particular TNM edition. A Java front-end was developed to interface with the ontology via the Web Ontology Language application programme interface and enables batch validation or classification of cancer registry records. The programme also allows the means of automated error correction in some instances. Initial tests verified the design concept by correctly inferring TNM stage and successfully handling the TNM-related validation checks on a number of cancer case records, with a performance similar to that of an existing ontology dedicated to the task.

Conclusions

The unified ontology provides a multi-purpose tool for TNM-related tasks in a cancer registry and is scalable for different editions of TNM. It offers a convenient way of quickly checking validity of cancer case stage information and for batch processing of multi-record data via a dedicated front-end programme. The ontology is adaptable to many uses, either as a standalone TNM module or as a component in applications of wider focus. It provides a first step towards a single, unified TNM ontology for cancer registries.

Collapse

Shedlock CJ, Stumpo KA. Data parsing in mass spectrometry imaging using R Studio and Cardinal: A tutorial. J Mass Spectrom Adv Clin Lab 2022;23:58-70. [PMID: 35072143 PMCID: PMC8762469 DOI: 10.1016/j.jmsacl.2021.12.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 12/14/2021] [Accepted: 12/15/2021] [Indexed: 02/07/2023] Open

Tzioutziou NA, James AB, Guo W, Calixto CPG, Zhang R, Nimmo HG, Brown JWS. Experimental Design for Time-Series RNA-Seq Analysis of Gene Expression and Alternative Splicing. Methods Mol Biol 2022;2398:173-88. [PMID: 34674176 DOI: 10.1007/978-1-0716-1912-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]

Roukema BF. Anti-clustering in the national SARS-CoV-2 daily infection counts. PeerJ 2021;9:e11856. [PMID: 34532156 PMCID: PMC8404575 DOI: 10.7717/peerj.11856] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 07/04/2021] [Indexed: 01/19/2023] Open

Abstract

The noise in daily infection counts of an epidemic should be super-Poissonian due to intrinsic epidemiological and administrative clustering. Here, we use this clustering to classify the official national SARS-CoV-2 daily infection counts and check for infection counts that are unusually anti-clustered. We adopt a one-parameter model of ϕ i ' infections per cluster, dividing any daily count n i into n i / ϕ i ' 'clusters', for 'country' i. We assume thatn i / ϕ i ' on a given day j is drawn from a Poisson distribution whose mean is robustly estimated from the four neighbouring days, and calculate the inferred Poisson probability P i j ' of the observation. The P i j ' values should be uniformly distributed. We find the value ϕ i that minimises the Kolmogorov-Smirnov distance from a uniform distribution. We investigate the (ϕ i , N i ) distribution, for total infection count N i . We consider consecutive count sequences above a threshold of 50 daily infections. We find that most of the daily infection count sequences are inconsistent with a Poissonian model. Most are found to be consistent with the ϕ i model. The 28-, 14- and 7-day least noisy sequences for several countries are best modelled as sub-Poissonian, suggesting a distinct epidemiological family. The 28-day least noisy sequence of Algeria has a preferred model that is strongly sub-Poissonian, with ϕ i 28 < 0.1 . Tajikistan, Turkey, Russia, Belarus, Albania, United Arab Emirates and Nicaragua have preferred models that are also sub-Poissonian, with ϕ i 28 < 0.5 . A statistically significant (P τ < 0.05) correlation was found between the lack of media freedom in a country, as represented by a high Reporters sans frontieres Press Freedom Index (PFI2020), and the lack of statistical noise in the country's daily counts. The ϕ i model appears to be an effective detector of suspiciously low statistical noise in the national SARS-CoV-2 daily infection counts.

Collapse

Lindner L, Weiß A, Reich A, Kindler S, Behrens F, Braun J, Listing J, Schett G, Sieper J, Strangfeld A, Regierer AC. Implementing an automated monitoring process in a digital, longitudinal observational cohort study. Arthritis Res Ther 2021;23:181. [PMID: 34233730 PMCID: PMC8262053 DOI: 10.1186/s13075-021-02563-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 06/24/2021] [Indexed: 11/29/2022] Open

Abstract

Background

Clinical data collection requires correct and complete data sets in order to perform correct statistical analysis and draw valid conclusions. While in randomized clinical trials much effort concentrates on data monitoring, this is rarely the case in observational studies- due to high numbers of cases and often-restricted resources. We have developed a valid and cost-effective monitoring tool, which can substantially contribute to an increased data quality in observational research.

Methods

An automated digital monitoring system for cohort studies developed by the German Rheumatism Research Centre (DRFZ) was tested within the disease register RABBIT-SpA, a longitudinal observational study including patients with axial spondyloarthritis and psoriatic arthritis. Physicians and patients complete electronic case report forms (eCRF) twice a year for up to 10 years. Automatic plausibility checks were implemented to verify all data after entry into the eCRF. To identify conflicts that cannot be found by this approach, all possible conflicts were compiled into a catalog. This “conflict catalog” was used to create queries, which are displayed as part of the eCRF. The proportion of queried eCRFs and responses were analyzed by descriptive methods. For the analysis of responses, the type of conflict was assigned to either a single conflict only (affecting individual items) or a conflict that required the entire eCRF to be queried.

Results

Data from 1883 patients was analyzed. A total of n = 3145 eCRFs submitted between baseline (T0) and T3 (12 months) had conflicts (40–64%). Fifty-six to 100% of the queries regarding eCRFs that were completely missing were answered. A mean of 1.4 to 2.4 single conflicts occurred per eCRF, of which 59–69% were answered. The most common missing values were CRP, ESR, Schober’s test, data on systemic glucocorticoid therapy, and presence of enthesitis.

Conclusion

Providing high data quality in large observational cohort studies is a major challenge, which requires careful monitoring. An automated monitoring process was successfully implemented and well accepted by the study centers. Two thirds of the queries were answered with new data. While conventional manual monitoring is resource-intensive and may itself create new sources of errors, automated processes are a convenient way to augment data quality.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13075-021-02563-2.

Collapse

Yeboah SK, Darkwa J. Experimental data on water vapour adsorption on silica gel in fully packed and Z-annulus packed beds. Data Brief 2021;34:106736. [PMID: 33506083 PMCID: PMC7815469 DOI: 10.1016/j.dib.2021.106736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Revised: 01/06/2021] [Accepted: 01/07/2021] [Indexed: 11/08/2022] Open

Nicholson NC, Giusti F, Bettio M, Negrao Carvalho R, Dimitrova N, Dyba T, Flego M, Neamtiu L, Randi G, Martos C. An ontology-based approach for developing a harmonised data-validation tool for European cancer registration. J Biomed Semantics 2021;12:1. [PMID: 33407816 PMCID: PMC7789225 DOI: 10.1186/s13326-020-00233-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 11/15/2020] [Indexed: 11/10/2022] Open

Abstract

Background

Population-based cancer registries constitute an important information source in cancer epidemiology. Studies collating and comparing data across regional and national boundaries have proved important for deploying and evaluating effective cancer-control strategies. A critical aspect in correctly comparing cancer indicators across regional and national boundaries lies in ensuring a good and harmonised level of data quality, which is a primary motivator for a centralised collection of pseudonymised data. The recent introduction of the European Union’s general data-protection regulation (GDPR) imposes stricter conditions on the collection, processing, and sharing of personal data. It also considers pseudonymised data as personal data. The new regulation motivates the need to find solutions that allow a continuation of the smooth processes leading to harmonised European cancer-registry data. One element in this regard would be the availability of a data-validation software tool based on a formalised depiction of the harmonised data-validation rules, allowing an eventual devolution of the data-validation process to the local level.

Results

A semantic data model was derived from the data-validation rules for harmonising cancer-data variables at European level. The data model was encapsulated in an ontology developed using the Web-Ontology Language (OWL) with the data-model entities forming the main OWL classes. The data-validation rules were added as axioms in the ontology. The reasoning function of the resulting ontology demonstrated its ability to trap registry-coding errors and in some instances to be able to correct errors.

Conclusions

Describing the European cancer-registry core data set in terms of an OWL ontology affords a tool based on a formalised set of axioms for validating a cancer-registry’s data set according to harmonised, supra-national rules. The fact that the data checks are inherently linked to the data model would lead to less maintenance overheads and also allow automatic versioning synchronisation, important for distributed data-quality checking processes.

Collapse

Contina A, Yanco SW, Pierce AK, DePrenger-Levin M, Wunder MB, Neophytou AM, Lostroh CP, Telford RJ, Benito BM, Chipperfield J, O'Hara RB, Carlson CJ. Comment on "A global-scale ecological niche model to predict SARS-CoV-2 coronavirus infection rate", author Coro. Ecol Modell 2020;436:109288. [PMID: 32982015 PMCID: PMC7505574 DOI: 10.1016/j.ecolmodel.2020.109288] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 09/11/2020] [Accepted: 09/12/2020] [Indexed: 01/04/2023]

Frese J, Gode A, Heinrichs G, Will A, Schulz AP. Validating a transnational fracture treatment registry using a standardized method. BMC Med Res Methodol 2019;19:241. [PMID: 31852451 PMCID: PMC6921413 DOI: 10.1186/s12874-019-0862-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 11/04/2019] [Indexed: 11/10/2022] Open

Abstract

AIM

Subsequent to a three-month pilot phase, recruiting patients for the newly established BFCC (Baltic Fracture Competence Centre) transnational fracture registry, a validation of the data quality needed to be carried out, applying a standardized method.

METHOD

During the literature research, the method of "adaptive monitoring" fulfilled the requirements of the registry and was applied. It consisted of a three-step audit process; firstly, scoring of the overall data quality, followed by source data verification of a sample size, relative to the scoring result, and finally, feedback to the registry on measures to improve data quality. Statistical methods for scoring of data quality and visualisation of discrepancies between registry data and source data were developed and applied.

RESULTS

Initially, the data quality of the registry scored as medium. During source data verification, missing items in the registry, causing medium data quality, turned out to be absent in the source as well. A subsequent adaptation of the score evaluated the registry's data quality as good. It was suggested to add variables to some items in order to improve the accuracy of the registry.

DISCUSSION

The application of the method of adaptive monitoring has only been published by Jacke et al., with a similar improvement of the scoring result following the audit process. Displaying data from the registry in graphs helped to find missing items and discover issues with data formats. Graphically comparing the degree of agreement between the registry and source data allowed to discover systematic faults.

CONCLUSIONS

The method of adaptive monitoring gives a substantiated guideline for systematically evaluating and monitoring a registry's data quality and is currently second to none. The resulting transparency of the registry's data quality could be helpful in annual reports, as published by most major registries. As the method has been rarely applied, further successive applications in established registries would be desirable.

Collapse

Udesky JO, Dodson RE, Perovich LJ, Rudel RA. Wrangling environmental exposure data: guidance for getting the best information from your laboratory measurements. Environ Health 2019;18:99. [PMID: 31752881 PMCID: PMC6868687 DOI: 10.1186/s12940-019-0537-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 10/24/2019] [Indexed: 05/26/2023]

Abstract

BACKGROUND

Environmental health and exposure researchers can improve the quality and interpretation of their chemical measurement data, avoid spurious results, and improve analytical protocols for new chemicals by closely examining lab and field quality control (QC) data. Reporting QC data along with chemical measurements in biological and environmental samples allows readers to evaluate data quality and appropriate uses of the data (e.g., for comparison to other exposure studies, association with health outcomes, use in regulatory decision-making). However many studies do not adequately describe or interpret QC assessments in publications, leaving readers uncertain about the level of confidence in the reported data. One potential barrier to both QC implementation and reporting is that guidance on how to integrate and interpret QC assessments is often fragmented and difficult to find, with no centralized repository or summary. In addition, existing documents are typically written for regulatory scientists rather than environmental health researchers, who may have little or no experience in analytical chemistry.

OBJECTIVES

We discuss approaches for implementing quality assurance/quality control (QA/QC) in environmental exposure measurement projects and describe our process for interpreting QC results and drawing conclusions about data validity.

DISCUSSION

Our methods build upon existing guidance and years of practical experience collecting exposure data and analyzing it in collaboration with contract and university laboratories, as well as the Centers for Disease Control and Prevention. With real examples from our data, we demonstrate problems that would not have come to light had we not engaged with our QC data and incorporated field QC samples in our study design. Our approach focuses on descriptive analyses and data visualizations that have been compatible with diverse exposure studies with sample sizes ranging from tens to hundreds of samples. Future work could incorporate additional statistically grounded methods for larger datasets with more QC samples.

CONCLUSIONS

This guidance, along with example table shells, graphics, and some sample R code, provides a useful set of tools for getting the best information from valuable environmental exposure datasets and enabling valid comparison and synthesis of exposure data across studies.

Collapse

Byron A. Reproducibility and Crossplatform Validation of Reverse-Phase Protein Array Data. Adv Exp Med Biol 2019;1188:181-201. [PMID: 31820389 DOI: 10.1007/978-981-32-9755-5_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Le QH, Verheijen PJT, van Loosdrecht MCM, Volcke EIP. Experimental design for evaluating WWTP data by linear mass balances. Water Res 2018;142:415-425. [PMID: 29908466 DOI: 10.1016/j.watres.2018.05.026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Revised: 04/23/2018] [Accepted: 05/14/2018] [Indexed: 06/08/2023]

Weis D, Willems H. Aggregation, Validation, and Generalization of Qualitative Data - Methodological and Practical Research Strategies Illustrated by the Research Process of an empirically Based Typology. Integr Psychol Behav Sci 2017;51:223-43. [PMID: 27957658 DOI: 10.1007/s12124-016-9372-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Harkat MF, Mansouri M, Nounou M, Nounou H. Enhanced data validation strategy of air quality monitoring network. Environ Res 2018;160:183-194. [PMID: 28987729 DOI: 10.1016/j.envres.2017.09.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 09/19/2017] [Accepted: 09/20/2017] [Indexed: 06/07/2023]

Daepp MI, Black J. Assessing the validity of commercial and municipal food environment data sets in Vancouver, Canada. Public Health Nutr 2017;20:2649-59. [PMID: 28816109 DOI: 10.1017/S1368980017001744] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Hoeven LRV, Bruijne MCD, Kemper PF, Koopman MMW, Rondeel JMM, Leyte A, Koffijberg H, Janssen MP, Roes KCB. Validation of multisource electronic health record data: an application to blood transfusion data. BMC Med Inform Decis Mak 2017;17:107. [PMID: 28709453 PMCID: PMC5512751 DOI: 10.1186/s12911-017-0504-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 07/10/2017] [Indexed: 11/10/2022] Open

Abstract

Background

Although data from electronic health records (EHR) are often used for research purposes, systematic validation of these data prior to their use is not standard practice. Existing validation frameworks discuss validity concepts without translating these into practical implementation steps or addressing the potential influence of linking multiple sources. Therefore we developed a practical approach for validating routinely collected data from multiple sources and to apply it to a blood transfusion data warehouse to evaluate the usability in practice.

Methods

The approach consists of identifying existing validation frameworks for EHR data or linked data, selecting validity concepts from these frameworks and establishing quantifiable validity outcomes for each concept. The approach distinguishes external validation concepts (e.g. concordance with external reports, previous literature and expert feedback) and internal consistency concepts which use expected associations within the dataset itself (e.g. completeness, uniformity and plausibility). In an example case, the selected concepts were applied to a transfusion dataset and specified in more detail.

Results

Application of the approach to a transfusion dataset resulted in a structured overview of data validity aspects. This allowed improvement of these aspects through further processing of the data and in some cases adjustment of the data extraction. For example, the proportion of transfused products that could not be linked to the corresponding issued products initially was 2.2% but could be improved by adjusting data extraction criteria to 0.17%.

Conclusions

This stepwise approach for validating linked multisource data provides a basis for evaluating data quality and enhancing interpretation. When the process of data validation is adopted more broadly, this contributes to increased transparency and greater reliability of research based on routinely collected electronic health records.

Electronic supplementary material

The online version of this article (doi:10.1186/s12911-017-0504-7) contains supplementary material, which is available to authorized users.

Collapse

Coletti M, Hultquist C, Kennedy WG, Cervone G. Validating Safecast data by comparisons to a U. S. Department of Energy Fukushima Prefecture aerial survey. J Environ Radioact 2017;171:9-20. [PMID: 28167372 DOI: 10.1016/j.jenvrad.2017.01.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 12/07/2016] [Accepted: 01/07/2017] [Indexed: 06/06/2023]

Wallace WE, Ji W, Tchekhovskoi DV, Phinney KW, Stein SE. Mass Spectral Library Quality Assurance by Inter-Library Comparison. J Am Soc Mass Spectrom 2017;28:733-738. [PMID: 28127680 PMCID: PMC5439505 DOI: 10.1007/s13361-016-1589-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Revised: 12/22/2016] [Accepted: 12/23/2016] [Indexed: 05/20/2023]

Rideout JR, Chase JH, Bolyen E, Ackermann G, González A, Knight R, Caporaso JG. Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets. Gigascience 2016;5:27. [PMID: 27296526 PMCID: PMC4906574 DOI: 10.1186/s13742-016-0133-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 06/01/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis.

MAIN TEXT

We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others.

CONCLUSIONS

Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

Collapse

Eftimie R. Validation of multi-scale models for fibrosis. Comment on "Towards a unified approach in the modeling of fibrosis: A review with research perspectives" by M. Ben Amar and C. Bianca. Phys Life Rev 2016;17:90-1. [PMID: 27161945 DOI: 10.1016/j.plrev.2016.05.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2016] [Accepted: 05/04/2016] [Indexed: 01/02/2023]

Mayer CL, Haley VB, Giardina R, Hazamy PA, Tsivitis M, Knab R, Lutterloh E. Lessons learned from initial reporting of carbapenem-resistant Enterobacteriaceae in New York State hospitals, 2013-2014. Am J Infect Control 2016;44:131-3. [PMID: 26601706 DOI: 10.1016/j.ajic.2015.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Revised: 09/01/2015] [Accepted: 09/02/2015] [Indexed: 10/22/2022]

Spindler A. Structural redundancy of data from wastewater treatment systems. Determination of individual balance equations. Water Res 2014;57:193-201. [PMID: 24721666 DOI: 10.1016/j.watres.2014.03.042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Revised: 03/04/2014] [Accepted: 03/05/2014] [Indexed: 06/03/2023]

Backman LA, Nobert G, Melchreit R, Fekieta R, Dembry LM. Validation of the surveillance and reporting of central line-associated bloodstream infection denominator data. Am J Infect Control 2014;42:28-33. [PMID: 24176605 DOI: 10.1016/j.ajic.2013.06.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Revised: 06/14/2013] [Accepted: 06/14/2013] [Indexed: 10/26/2022]

Rich KL, Reese SM, Bol KA, Gilmartin HM, Janosz T. Assessment of the quality of publicly reported central line-associated bloodstream infection data in Colorado, 2010. Am J Infect Control 2013;41:874-9. [PMID: 23498552 DOI: 10.1016/j.ajic.2012.12.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Revised: 12/11/2012] [Accepted: 12/11/2012] [Indexed: 11/23/2022]