1
|
Briney KA. Measuring data rot: An analysis of the continued availability of shared data from a Single University. PLoS One 2024; 19:e0304781. [PMID: 38838010 DOI: 10.1371/journal.pone.0304781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 05/17/2024] [Indexed: 06/07/2024] Open
Abstract
To determine where data is shared and what data is no longer available, this study analyzed data shared by researchers at a single university. 2166 supplemental data links were harvested from the university's institutional repository and web scraped using R. All links that failed to scrape or could not be tested algorithmically were tested for availability by hand. Trends in data availability by link type, age of publication, and data source were examined for patterns. Results show that researchers shared data in hundreds of places. About two-thirds of links to shared data were in the form of URLs and one-third were DOIs, with several FTP links and links directly to files. A surprising 13.4% of shared URL links pointed to a website homepage rather than a specific record on a website. After testing, 5.4% the 2166 supplemental data links were found to be no longer available. DOIs were the type of shared link that was least likely to disappear with a 1.7% loss, with URL loss at 5.9% averaged over time. Links from older publications were more likely to be unavailable, with a data disappearance rate estimated at 2.6% per year, as well as links to data hosted on journal websites. The results support best practice guidance to share data in a data repository using a permanent identifier.
Collapse
Affiliation(s)
- Kristin A Briney
- Caltech Library, California Institute of Technology, Pasadena, CA, United States of America
| |
Collapse
|
2
|
Leigh DM, Vandergast AG, Hunter ME, Crandall ED, Funk WC, Garroway CJ, Hoban S, Oyler-McCance SJ, Rellstab C, Segelbacher G, Schmidt C, Vázquez-Domínguez E, Paz-Vinas I. Best practices for genetic and genomic data archiving. Nat Ecol Evol 2024:10.1038/s41559-024-02423-7. [PMID: 38789640 DOI: 10.1038/s41559-024-02423-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 04/25/2024] [Indexed: 05/26/2024]
Abstract
Genetic and genomic data are collected for a vast array of scientific and applied purposes. Despite mandates for public archiving, data are typically used only by the generating authors. The reuse of genetic and genomic datasets remains uncommon because it is difficult, if not impossible, due to non-standard archiving practices and lack of contextual metadata. But as the new field of macrogenetics is demonstrating, if genetic data and their metadata were more accessible and FAIR (findable, accessible, interoperable and reusable) compliant, they could be reused for many additional purposes. We discuss the main challenges with existing genetic and genomic data archives, and suggest best practices for archiving genetic and genomic data. Recognizing that this is a longstanding issue due to little formal data management training within the fields of ecology and evolution, we highlight steps that research institutions and publishers could take to improve data archiving.
Collapse
Affiliation(s)
- Deborah M Leigh
- Swiss Federal Research Institute WSL, Birmensdorf, Switzerland.
| | - Amy G Vandergast
- US Geological Survey, Western Ecological Research Center, San Diego, CA, USA
| | - Margaret E Hunter
- US Geological Survey, Wetland & Aquatic Research Center, Gainesville, FL, USA
| | - Eric D Crandall
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - W Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO, USA
| | - Colin J Garroway
- Department of Biological Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Sean Hoban
- Center for Tree Science, The Morton Arboretum, Lisle, IL, USA
| | | | | | | | - Chloé Schmidt
- German Centre for Integrative Biodiversity Research Halle-Jena-Leipzig, Leipzig, Germany
| | - Ella Vázquez-Domínguez
- Departamento de Ecología de la Biodiversidad, Instituto de Ecología, Universidad Nacional Autónoma de México, Coyoacán, Ciudad de México, México
| | - Ivan Paz-Vinas
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO, USA
- Universite Claude Bernard Lyon 1, LEHNA UMR 5023, CNRS, ENTPE, Villeurbanne, France
| |
Collapse
|
3
|
Kapoor S, Cantrell EM, Peng K, Pham TH, Bail CA, Gundersen OE, Hofman JM, Hullman J, Lones MA, Malik MM, Nanayakkara P, Poldrack RA, Raji ID, Roberts M, Salganik MJ, Serra-Garcia M, Stewart BM, Vandewiele G, Narayanan A. REFORMS: Consensus-based Recommendations for Machine-learning-based Science. SCIENCE ADVANCES 2024; 10:eadk3452. [PMID: 38691601 PMCID: PMC11092361 DOI: 10.1126/sciadv.adk3452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 05/03/2024]
Abstract
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear recommendations for conducting and reporting ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist (recommendations for machine-learning-based science). It consists of 32 questions and a paired set of guidelines. REFORMS was developed on the basis of a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
Collapse
Affiliation(s)
- Sayash Kapoor
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Emily M. Cantrell
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- School of Public and International Affairs, Princeton University, Princeton, NJ 08544, USA
| | - Kenny Peng
- Department of Computer Science, Cornell University, Ithaca, NY 14850, USA
| | - Thanh Hien Pham
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Christopher A. Bail
- Department of Sociology, Duke University, Durham, NC 27708, USA
- Department of Political Science, Duke University, Durham, NC 27708, USA
- Sanford School of Public Policy, Duke University, Durham, NC 27708, USA
| | - Odd Erik Gundersen
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Aneo AS, Trondheim, Norway
| | | | - Jessica Hullman
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Michael A. Lones
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
| | - Momin M. Malik
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute in Critical Quantitative, Computational, & Mixed Methodologies, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Priyanka Nanayakkara
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
- Department of Communication Studies, Northwestern University, Evanston, IL 60208, USA
| | | | - Inioluwa Deborah Raji
- Department of Computer Science, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Matthew J. Salganik
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
| | - Marta Serra-Garcia
- Rady School of Management, University of California, San Diego, La Jolla, CA 92093, USA
| | - Brandon M. Stewart
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
- Department of Politics, Princeton University, Princeton, NJ 08544, USA
| | - Gilles Vandewiele
- Department of Information Technology, Ghent University, Ghent, Belgium
| | - Arvind Narayanan
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
4
|
Chin J, Zeiler K, Dilevski N, Holcombe A, Gatfield-Jeffries R, Bishop R, Vazire S, Schiavone S. The transparency of quantitative empirical legal research published in highly ranked law journals (2018-2020): an observational study. F1000Res 2024; 12:144. [PMID: 37600907 PMCID: PMC10435919 DOI: 10.12688/f1000research.127563.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/18/2024] [Indexed: 08/22/2023] Open
Abstract
Background Scientists are increasingly concerned with making their work easy to verify and build upon. Associated practices include sharing data, materials, and analytic scripts, and preregistering protocols. This shift towards increased transparency and rigor has been referred to as a "credibility revolution." The credibility of empirical legal research has been questioned in the past due to its distinctive peer review system and because the legal background of its researchers means that many often are not trained in study design or statistics. Still, there has been no systematic study of transparency and credibility-related characteristics of published empirical legal research. Methods To fill this gap and provide an estimate of current practices that can be tracked as the field evolves, we assessed 300 empirical articles from highly ranked law journals including both faculty-edited journals and student-edited journals. Results We found high levels of article accessibility (86%, 95% CI = [82%, 90%]), especially among student-edited journals (100%). Few articles stated that a study's data are available (19%, 95% CI = [15%, 23%]). Statements of preregistration (3%, 95% CI = [1%, 5%]) and availability of analytic scripts (6%, 95% CI = [4%, 9%]) were very uncommon. (i.e., they collected new data using the study's reported methods, but found results inconsistent or not as strong as the original). Conclusion We suggest that empirical legal researchers and the journals that publish their work cultivate norms and practices to encourage research credibility. Our estimates may be revisited to track the field's progress in the coming years.
Collapse
Affiliation(s)
- Jason Chin
- College of Law, Australian National University, Canberra, ACT, Australia
| | | | - Natali Dilevski
- Centre for Investigative Interviewing, Griffith Criminology Institute, Griffith University, Brisbane, Qld, Australia
| | - Alex Holcombe
- Psychology, University of Sydney, Sydney, NSW, Australia
| | | | - Ruby Bishop
- School of Law, University of Sydney, Sydney, NSW, Australia
| | - Simine Vazire
- Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Vic, Australia
| | | |
Collapse
|
5
|
Freitas LT, Khan MA, Uddin A, Halder JB, Singh-Phulgenda S, Raja JD, Balakrishnan V, Harriss E, Rahi M, Brack M, Guérin PJ, Basáñez MG, Kumar A, Walker M, Srividya A. The lymphatic filariasis treatment study landscape: A systematic review of study characteristics and the case for an individual participant data platform. PLoS Negl Trop Dis 2024; 18:e0011882. [PMID: 38227595 PMCID: PMC10817204 DOI: 10.1371/journal.pntd.0011882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 01/26/2024] [Accepted: 12/22/2023] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Lymphatic filariasis (LF) is a neglected tropical disease (NTD) targeted by the World Health Organization for elimination as a public health problem (EPHP). Since 2000, more than 9 billion treatments of antifilarial medicines have been distributed through mass drug administration (MDA) programmes in 72 endemic countries and 17 countries have reached EPHP. Yet in 2021, nearly 900 million people still required MDA with combinations of albendazole, diethylcarbamazine and/or ivermectin. Despite the reliance on these drugs, there remain gaps in understanding of variation in responses to treatment. As demonstrated for other infectious diseases, some urgent questions could be addressed by conducting individual participant data (IPD) meta-analyses. Here, we present the results of a systematic literature review to estimate the abundance of IPD on pre- and post-intervention indicators of infection and/or morbidity and assess the feasibility of building a global data repository. METHODOLOGY We searched literature published between 1st January 2000 and 5th May 2023 in 15 databases to identify prospective studies assessing LF treatment and/or morbidity management and disease prevention (MMDP) approaches. We considered only studies where individual participants were diagnosed with LF infection or disease and were followed up on at least one occasion after receiving an intervention/treatment. PRINCIPAL FINDINGS We identified 138 eligible studies from 23 countries, having followed up an estimated 29,842 participants after intervention. We estimate 14,800 (49.6%) IPD on pre- and post-intervention infection indicators including microfilaraemia, circulating filarial antigen and/or ultrasound indicators measured before and after intervention using 8 drugs administered in various combinations. We identified 33 studies on MMDP, estimating 6,102 (20.4%) IPD on pre- and post-intervention clinical morbidity indicators only. A further 8,940 IPD cover a mixture of infection and morbidity outcomes measured with other diagnostics, from participants followed for adverse event outcomes only or recruited after initial intervention. CONCLUSIONS The LF treatment study landscape is heterogeneous, but the abundance of studies and related IPD suggest that establishing a global data repository to facilitate IPD meta-analyses would be feasible and useful to address unresolved questions on variation in treatment outcomes across geographies, demographics and in underrepresented groups. New studies using more standardized approaches should be initiated to address the scarcity and inconsistency of data on morbidity management.
Collapse
Affiliation(s)
- Luzia T. Freitas
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- London Centre for Neglected Tropical Disease Research, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- Infectious Diseases Data Observatory, University of Oxford, Oxford, United Kingdom
| | | | - Azhar Uddin
- ICMR-Vector Control Research Centre, Puducherry, India
| | - Julia B. Halder
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- London Centre for Neglected Tropical Disease Research, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- Infectious Diseases Data Observatory, University of Oxford, Oxford, United Kingdom
- Department of Pathobiology and Population Sciences, Royal Veterinary College, Hatfield, United Kingdom
| | - Sauman Singh-Phulgenda
- Infectious Diseases Data Observatory, University of Oxford, Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | | | | | - Eli Harriss
- The Knowledge Centre, Bodleian Health Care Libraries, University of Oxford, Oxford, United Kingdom
| | - Manju Rahi
- ICMR-Vector Control Research Centre, Puducherry, India
| | - Matthew Brack
- Infectious Diseases Data Observatory, University of Oxford, Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Philippe J. Guérin
- Infectious Diseases Data Observatory, University of Oxford, Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Maria-Gloria Basáñez
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- London Centre for Neglected Tropical Disease Research, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- Infectious Diseases Data Observatory, University of Oxford, Oxford, United Kingdom
| | - Ashwani Kumar
- Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, India
| | - Martin Walker
- London Centre for Neglected Tropical Disease Research, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- Infectious Diseases Data Observatory, University of Oxford, Oxford, United Kingdom
- Department of Pathobiology and Population Sciences, Royal Veterinary College, Hatfield, United Kingdom
| | | |
Collapse
|
6
|
Munir A, Dahal P, Kumar R, Singh-Phulgenda S, Siddiqui NA, Naylor C, Wilson J, Buck G, Rahi M, Alves F, Malaviya P, Sundar S, Ritmeijer K, Stepniewska K, Pandey K, Guérin PJ, Musa A. Haematological dynamics following treatment of visceral leishmaniasis: a protocol for systematic review and individual participant data (IPD) meta-analysis. BMJ Open 2023; 13:e074841. [PMID: 38101841 PMCID: PMC10729213 DOI: 10.1136/bmjopen-2023-074841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/31/2023] [Indexed: 12/17/2023] Open
Abstract
INTRODUCTION Visceral leishmaniasis (VL) is a parasitic disease with an estimated 30 000 new cases occurring annually. Despite anaemia being a common haematological manifestation of VL, the evolution of different haematological characteristics following treatment remains poorly understood. An individual participant data meta-analysis (IPD-MA) is planned to characterise the haematological dynamics in patients with VL. METHODS AND ANALYSIS The Infectious Diseases Data Observatory (IDDO) VL data platform is a global repository of IPD from therapeutic studies identified through a systematic search of published literature (PROSPERO registration: CRD42021284622). The platform currently holds datasets from clinical trials standardised to a common data format. Corresponding authors and principal investigators of the studies indexed in the IDDO VL data platform meeting the eligibility criteria for inclusion were invited to be part of the collaborative IPD-MA. Mixed-effects multivariable regression models will be constructed to identify determinants of haematological parameters by taking clustering within study sites into account. ETHICS AND DISSEMINATION This IPD-MA meets the criteria for waiver of ethical review as defined by the Oxford Tropical Research Ethics Committee (OxTREC) granted to IDDO, as the research consists of secondary analysis of existing anonymised data (exempt granted on 29 March 2023, OxTREC REF: IDDO). Ethics approval was granted by the ICMR-Rajendra Memorial Research Institute of Medical Sciences ethics committee (letter no.: RMRI/EC/30/2022) on 4 July 2022. The results of this analysis will be disseminated at conferences, the IDDO website and peer-reviewed publications in open-access journals. The findings of this research will be critically important for control programmes at regional and global levels, policymakers and groups developing new VL treatments. PROSPERO REGISTRATION NUMBER CRD42021284622.
Collapse
Affiliation(s)
- Abdalla Munir
- Infectious Diseases Data Observatory (IDDO), Oxford, UK
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Prabin Dahal
- Infectious Diseases Data Observatory (IDDO), Oxford, UK
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Rishikesh Kumar
- Rajendra Memorial Research Institute of Medical Sciences (RMRIMS), Patna, India
| | - Sauman Singh-Phulgenda
- Infectious Diseases Data Observatory (IDDO), Oxford, UK
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | | | - Caitlin Naylor
- Infectious Diseases Data Observatory (IDDO), Oxford, UK
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - James Wilson
- Infectious Diseases Data Observatory (IDDO), Oxford, UK
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Gemma Buck
- Infectious Diseases Data Observatory (IDDO), Oxford, UK
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Manju Rahi
- Indian Council of Medical Research (ICMR), New Delhi, India
| | - Fabiana Alves
- Drugs for Neglected Disease Initiative, Geneva, Switzerland
| | - Paritosh Malaviya
- Infectious Disease Research Laboratory, Department of Medicine, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India
| | - Shyam Sundar
- Infectious Disease Research Laboratory, Department of Medicine, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India
| | | | - Kasia Stepniewska
- Infectious Diseases Data Observatory (IDDO), Oxford, UK
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Krishna Pandey
- Rajendra Memorial Research Institute of Medical Sciences (RMRIMS), Patna, India
| | - Philippe J Guérin
- Infectious Diseases Data Observatory (IDDO), Oxford, UK
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Ahmed Musa
- Institute of Endemic Diseases, University of Khartoum, Khartoum, Sudan
| |
Collapse
|
7
|
Ouellet S, Lemaréchal Y, Berumen-Murillo F, Lavallée MC, Vigneault É, Martin AG, Foster W, Thomson RM, Després P, Beaulieu L. A Monte Carlo dose recalculation pipeline for durable datasets: an I-125 LDR prostate brachytherapy use case. Phys Med Biol 2023; 68:235001. [PMID: 37863069 DOI: 10.1088/1361-6560/ad058b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/20/2023] [Indexed: 10/22/2023]
Abstract
Monte Carlo (MC) dose datasets are valuable for large-scale dosimetric studies. This work aims to build and validate a DICOM-compliant automated MC dose recalculation pipeline with an application to the production of I-125 low dose-rate prostate brachytherapy MC datasets. Built as a self-contained application, the recalculation pipeline ingested clinical DICOM-RT studies, reproduced the treatment into the Monte Carlo simulation, and outputted a traceable and durable dose distribution in the DICOM dose format. MC simulations with TG43-equivalent conditions using both TOPAS andegs_brachyMC codes were compared to TG43 calculations to validate the pipeline. The consistency of the pipeline when generating TG186 simulations was measured by comparing simulations made with both MC codes. Finally,egs_brachysimulations were run on a 240-patient cohort to simulate a large-scale application of the pipeline. Compared to line source TG43 calculations, simulations with both MC codes had more than 90% of voxels with a global difference under ±1%. Differences of 2.1% and less were seen in dosimetric indices when comparing TG186 simulations from both MC codes. The large-scale comparison ofegs_brachysimulations with treatment planning system dose calculation seen the same dose overestimation of TG43 calculations showed in previous studies. The MC dose recalculation pipeline built and validated against TG43 calculations in this work efficiently produced durable MC dose datasets. Since the dataset could reproduce previous dosimetric studies within 15 h at a rate of 20 cases per 25 min, the pipeline is a promising tool for future large-scale dosimetric studies.
Collapse
Affiliation(s)
- Samuel Ouellet
- Département de physique, de génie physique et d'optique, et Centre de recherche sur le cancer, Université Laval, Québec, Québec, Canada
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| | - Yannick Lemaréchal
- Département de physique, de génie physique et d'optique, et Centre de recherche sur le cancer, Université Laval, Québec, Québec, Canada
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| | - Francisco Berumen-Murillo
- Département de physique, de génie physique et d'optique, et Centre de recherche sur le cancer, Université Laval, Québec, Québec, Canada
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| | - Marie-Claude Lavallée
- Département de physique, de génie physique et d'optique, et Centre de recherche sur le cancer, Université Laval, Québec, Québec, Canada
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| | - Éric Vigneault
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| | - André-Guy Martin
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| | - William Foster
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| | - Rowan M Thomson
- Carleton Laboratory for Radiotherapy Physics, Department of Physics, Carleton University, Ottawa, Ontario, Canada
| | - Philippe Després
- Département de physique, de génie physique et d'optique, et Centre de recherche sur le cancer, Université Laval, Québec, Québec, Canada
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| | - Luc Beaulieu
- Département de physique, de génie physique et d'optique, et Centre de recherche sur le cancer, Université Laval, Québec, Québec, Canada
- Service de radio-oncologie et Axe Oncologie du CRCHU de Québec, CHU de Québec-Université Laval, Quebec, QC, Canada
| |
Collapse
|
8
|
Krähmer D, Schächtele L, Schneck A. Care to share? Experimental evidence on code sharing behavior in the social sciences. PLoS One 2023; 18:e0289380. [PMID: 37549146 PMCID: PMC10406284 DOI: 10.1371/journal.pone.0289380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 07/18/2023] [Indexed: 08/09/2023] Open
Abstract
Transparency and peer control are cornerstones of good scientific practice and entail the replication and reproduction of findings. The feasibility of replications, however, hinges on the premise that original researchers make their data and research code publicly available. This applies in particular to large-N observational studies, where analysis code is complex and may involve several ambiguous analytical decisions. To investigate which specific factors influence researchers' code sharing behavior upon request, we emailed code requests to 1,206 authors who published research articles based on data from the European Social Survey between 2015 and 2020. In this preregistered multifactorial field experiment, we randomly varied three aspects of our code request's wording in a 2x4x2 factorial design: the overall framing of our request (enhancement of social science research, response to replication crisis), the appeal why researchers should share their code (FAIR principles, academic altruism, prospect of citation, no information), and the perceived effort associated with code sharing (no code cleaning required, no information). Overall, 37.5% of successfully contacted authors supplied their analysis code. Of our experimental treatments, only framing affected researchers' code sharing behavior, though in the opposite direction we expected: Scientists who received the negative wording alluding to the replication crisis were more likely to share their research code. Taken together, our results highlight that the availability of research code will hardly be enhanced by small-scale individual interventions but instead requires large-scale institutional norms.
Collapse
Affiliation(s)
- Daniel Krähmer
- Department of Sociology, University of Munich (LMU), Munich, Germany
| | - Laura Schächtele
- Department of Sociology, University of Munich (LMU), Munich, Germany
| | - Andreas Schneck
- Department of Sociology, University of Munich (LMU), Munich, Germany
| |
Collapse
|
9
|
Matthews F, Verstraeten G, Borrelli P, Vanmaercke M, Poesen J, Steegen A, Degré A, Rodríguez BC, Bielders C, Franke C, Alary C, Zumr D, Patault E, Nadal-Romero E, Smolska E, Licciardello F, Swerts G, Thodsen H, Casalí J, Eslava J, Richet JB, Ouvry JF, Farguell J, Święchowicz J, Nunes JP, Pak LT, Liakos L, Campo-Bescós MA, Żelazny M, Delaporte M, Pineux N, Henin N, Bezak N, Lana-Renault N, Tzoraki O, Giménez R, Li T, Zuazo VHD, Bagarello V, Pampalone V, Ferro V, Úbeda X, Panagos P. EUSEDcollab: a network of data from European catchments to monitor net soil erosion by water. Sci Data 2023; 10:515. [PMID: 37542067 PMCID: PMC10403541 DOI: 10.1038/s41597-023-02393-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 07/17/2023] [Indexed: 08/06/2023] Open
Abstract
As a network of researchers we release an open-access database (EUSEDcollab) of water discharge and suspended sediment yield time series records collected in small to medium sized catchments in Europe. EUSEDcollab is compiled to overcome the scarcity of open-access data at relevant spatial scales for studies on runoff, soil loss by water erosion and sediment delivery. Multi-source measurement data from numerous researchers and institutions were harmonised into a common time series and metadata structure. Data reuse is facilitated through accompanying metadata descriptors providing background technical information for each monitoring station setup. Across ten European countries, EUSEDcollab covers over 1600 catchment years of data from 245 catchments at event (11 catchments), daily (22 catchments) and monthly (212 catchments) temporal resolution, and is unique in its focus on small to medium catchment drainage areas (median = 43 km2, min = 0.04 km2, max = 817 km2) with applicability for soil erosion research. We release this database with the aim of uniting people, knowledge and data through the European Union Soil Observatory (EUSO).
Collapse
Affiliation(s)
- Francis Matthews
- European Commission, Joint Research Centre, Via Enrico Fermi, 2749, Ispra, VA, 21026, Italy
- Earth and Environmental Sciences, KU Leuven, Celestijnenlaan 200e - box 2409, 3001, Leuven, Belgium
| | - Gert Verstraeten
- Earth and Environmental Sciences, KU Leuven, Celestijnenlaan 200e - box 2409, 3001, Leuven, Belgium
| | - Pasquale Borrelli
- Department of Science, Roma Tre University, Viale Guglielmo Marconi 446, 146, Roma, Italy
- Department of Environmental Sciences, University of Basel, Bernoullistrasse 30, 4056, Basel, Switzerland
| | - Matthias Vanmaercke
- Earth and Environmental Sciences, KU Leuven, Celestijnenlaan 200e - box 2409, 3001, Leuven, Belgium
| | - Jean Poesen
- Earth and Environmental Sciences, KU Leuven, Celestijnenlaan 200e - box 2409, 3001, Leuven, Belgium
- Institute of Earth and Environmental Sciences, Maria Curie-Sklodowska University (UMCS), Kra´snicka Av. 2d, Lublin, 20-718, Poland
| | - An Steegen
- Earth and Environmental Sciences, KU Leuven, Celestijnenlaan 200e - box 2409, 3001, Leuven, Belgium
| | - Aurore Degré
- Gembloux Agro-Bio Tech, Uliège, Passage des Déportés 2, Gembloux, 5030, Belgium
| | - Belén Cárceles Rodríguez
- Natural Resources and Forestry, Instituto Andaluz de Investigación y Formación Agraria, Pesquera, Alimentaria y de la Producción Ecológica (IFAPA), Camino de Purchil s/n, Granada, 18005, Spain
| | - Charles Bielders
- Earth and Life Institute - environmental sciences, UCLouvain, Croix du sud 2, Louvain-la-Neuve, 1348, Belgium
| | - Christine Franke
- Centre of Geosciences and Geoengineering, Mines Paris-PSL, 35 Rue Saint Honoré, Fontainebleau, 77305, France
| | - Claire Alary
- LGCgE, IMT Nord-Europe, 942 rue Charles Bourseul, Douai, 59508, France
| | - David Zumr
- Department of Landscape Water Conservation, Czech Technical University in Prague, Thákurova 7, Praha 6, Prague, 16629, Czech Republic
| | - Edouard Patault
- Altereo, Innovation and Digital division, 2 Av. Madeleine Bonnaud, Venelles, 13770, France
| | - Estela Nadal-Romero
- Instituto Pirenaico de Ecología (IPE-CSIC), Avenida Montañana 1005, Zaragoza, 50059, Spain
| | - Ewa Smolska
- Faculty of Geography and Regional Studies, University of Warsaw, Krakowskie Przedmieście 30, 00-927, Warsaw, Poland
| | - Feliciana Licciardello
- Department of Agriculture, Food and Environment, University of Catania, Via Santa Sofia 100, Catania, 95123, Italy
| | - Gilles Swerts
- Gembloux Agro-Bio Tech, Uliège, Passage des Déportés 2, Gembloux, 5030, Belgium
| | - Hans Thodsen
- Ecoscience, Aarhus University, C.F. Møllers Allé 3, Aarhus, 8000, Denmark
| | - Javier Casalí
- Department of Engineering; IS-FOOD Institute (Innovation & Sustainable Development in Food Chain), Public University of Navarre, Campus de Arrosadia, Cataluña avenue, Pamplona, Navarra, 31006, Spain
| | - Javier Eslava
- Division of Soils and Climatology, Department of Rural Development and Environment, Government of Navarre, González Tablas Street, 9, Pamplona, Navarra, 31003, Spain
| | | | | | - Joaquim Farguell
- Geography, University of Barcelona, Montalegre 6, Barcelona, 8001, Spain
| | - Jolanta Święchowicz
- Institute of Geography and Spatial Management, Jagiellonian University in Kraków, 7 Gronostajowa Str., Kraków, 30-387, Poland
| | - João Pedro Nunes
- Soil Physics and Land Management, Wageningen University, P.O. Box 47, Wageningen, 6700 AA, Netherlands
- cE3c - Center for Ecology, Evolution and Environmental Changes & CHANGE - Global Change and Sustainability Institute, Faculdade de Ciências da Universidade de Lisboa, Edifício C2, Piso 5, Sala 2.5.46, Campo Grande, Lisbon, 1749-016, Portugal
| | - Lai Ting Pak
- AREAS, 2 Avenue Foch, 76460, Saint-Valery-en-Caux, France
| | - Leonidas Liakos
- UNISYSTEMS, Rue du Puits Romain 29, Bertrange, L-8070, Luxembourg
| | - Miguel A Campo-Bescós
- Department of Engineering; IS-FOOD Institute (Innovation & Sustainable Development in Food Chain), Public University of Navarre, Campus de Arrosadia, Cataluña avenue, Pamplona, Navarra, 31006, Spain
| | - Mirosław Żelazny
- Institute of Geography and Spatial Management, Jagiellonian University in Kraków, 7 Gronostajowa Str., Kraków, 30-387, Poland
| | - Morgan Delaporte
- LGCgE, IMT Nord-Europe, 942 rue Charles Bourseul, Douai, 59508, France
| | - Nathalie Pineux
- UNISYSTEMS, Rue du Puits Romain 29, Bertrange, L-8070, Luxembourg
| | - Nathan Henin
- Earth and Life Institute - environmental sciences, UCLouvain, Croix du sud 2, Louvain-la-Neuve, 1348, Belgium
| | - Nejc Bezak
- Faculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, 1000, Ljubljana, Slovenia
| | - Noemí Lana-Renault
- Ciencias Humanas, University of La Rioja, Luis de Ulloa 2, 26004, La Rioja, Spain
- Institute for Biodiversity and Ecosystem Dynamics, Universiteit van Amsterdam, Science Park 904, 1098XH, Amsterdam, The Netherlands
| | - Ourania Tzoraki
- Marine Sciences Department, University of the Aegean, University hill, Mytilene, 81100, Greece
| | - Rafael Giménez
- Department of Engineering; IS-FOOD Institute (Innovation & Sustainable Development in Food Chain), Public University of Navarre, Campus de Arrosadia, Cataluña avenue, Pamplona, Navarra, 31006, Spain
| | - Tailin Li
- Department of Landscape Water Conservation, Czech Technical University in Prague, Thákurova 7, Praha 6, Prague, 16629, Czech Republic
| | - Víctor Hugo Durán Zuazo
- Natural Resources and Forestry, Instituto Andaluz de Investigación y Formación Agraria, Pesquera, Alimentaria y de la Producción Ecológica (IFAPA), Camino de Purchil s/n, Granada, 18005, Spain
| | - Vincenzo Bagarello
- Department of Agricultural, Food and Forest Sciences, University of Palermo, Viale delle Scienze, Building 4, Palermo, 90128, Italy
| | - Vincenzo Pampalone
- Department of Agricultural, Food and Forest Sciences, University of Palermo, Viale delle Scienze, Building 4, Palermo, 90128, Italy
| | - Vito Ferro
- Department of Agricultural, Food and Forest Sciences, University of Palermo, Viale delle Scienze, Building 4, Palermo, 90128, Italy
- NBFC, National Biodiversity Future Center, Palermo, 90133, Italy
| | - Xavier Úbeda
- Geography, University of Barcelona, Montalegre 6, Barcelona, 8001, Spain
| | - Panos Panagos
- European Commission, Joint Research Centre, Via Enrico Fermi, 2749, Ispra, VA, 21026, Italy.
| |
Collapse
|
10
|
Crandall ED, Toczydlowski RH, Liggins L, Holmes AE, Ghoojaei M, Gaither MR, Wham BE, Pritt AL, Noble C, Anderson TJ, Barton RL, Berg JT, Beskid SG, Delgado A, Farrell E, Himmelsbach N, Queeno SR, Trinh T, Weyand C, Bentley A, Deck J, Riginos C, Bradburd GS, Toonen RJ. Importance of timely metadata curation to the global surveillance of genetic diversity. CONSERVATION BIOLOGY : THE JOURNAL OF THE SOCIETY FOR CONSERVATION BIOLOGY 2023; 37:e14061. [PMID: 36704891 PMCID: PMC10751740 DOI: 10.1111/cobi.14061] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/27/2022] [Accepted: 01/07/2023] [Indexed: 05/18/2023]
Abstract
Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies produce large amounts of genome-scale genetic diversity data for wild populations, but most (87%) do not include the associated spatial and temporal metadata necessary for them to be reused in monitoring programs or for acknowledging the sovereignty of nations or Indigenous peoples. We undertook a distributed datathon to quantify the availability of these missing metadata and to test the hypothesis that their availability decays with time. We also worked to remediate missing metadata by extracting them from associated published papers, online repositories, and direct communication with authors. Starting with 848 candidate genomic data sets (reduced representation and whole genome) from the International Nucleotide Sequence Database Collaboration, we determined that 561 contained mostly samples from wild populations. We successfully restored spatiotemporal metadata for 78% of these 561 data sets (n = 440 data sets with data on 45,105 individuals from 762 species in 17 phyla). Examining papers and online repositories was much more fruitful than contacting 351 authors, who replied to our email requests 45% of the time. Overall, 23% of our email queries to authors unearthed useful metadata. The probability of retrieving spatiotemporal metadata declined significantly as age of the data set increased. There was a 13.5% yearly decrease in metadata associated with published papers or online repositories and up to a 22% yearly decrease in metadata that were only available from authors. This rapid decay in metadata availability, mirrored in studies of other types of biological data, should motivate swift updates to data-sharing policies and researcher practices to ensure that the valuable context provided by metadata is not lost to conservation science forever.
Collapse
Affiliation(s)
- Eric D Crandall
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Rachel H Toczydlowski
- Ecology, Evolution, and Behavior Program, Department of Integrative Biology, Michigan State University, East Lansing, Michigan, USA
| | - Libby Liggins
- School of Natural Sciences, Massey University, Auckland, New Zealand
| | - Ann E Holmes
- Department of Animal Science, University of California, Davis, Davis, California, USA
| | - Maryam Ghoojaei
- Department of Biology, University of Central Florida, Orlando, Florida, USA
| | - Michelle R Gaither
- Department of Biology, University of Central Florida, Orlando, Florida, USA
| | - Briana E Wham
- Department of Research Informatics and Publishing, The Pennsylvania State University Libraries, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Andrea L Pritt
- Madlyn L. Hanes Library, The Pennsylvania State University Libraries, Pennsylvania State University, Middletown, Pennsylvania, USA
| | - Cory Noble
- School of Natural Sciences, Massey University, Auckland, New Zealand
| | - Tanner J Anderson
- Department of Anthropology, University of Oregon, Eugene, Oregon, USA
| | - Randi L Barton
- Department of Marine Science, California State University Monterey Bay, Seaside, California, USA
- Moss Landing Marine Laboratories, Moss Landing, California, USA
| | - Justin T Berg
- UOG Marine Laboratory, University of Guam, Mangilao, Guam
| | - Sofia G Beskid
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas, USA
| | - Alonso Delgado
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, Ohio, USA
| | - Emily Farrell
- Department of Biology, University of Central Florida, Orlando, Florida, USA
| | - Nan Himmelsbach
- Department of Natural Science, Hawai'i Pacific University, Honolulu, Hawaii, USA
| | - Samantha R Queeno
- Department of Anthropology, University of Oregon, Eugene, Oregon, USA
| | - Thienthanh Trinh
- Department of Biology, University of Central Florida, Orlando, Florida, USA
| | - Courtney Weyand
- Department of Biological Sciences, Auburn University, Auburn, Alabama, USA
| | - Andrew Bentley
- Biodiversity Institute, University of Kansas, Lawrence, Kansas, USA
| | - John Deck
- Berkeley Natural History Museums, University of California, Berkeley, Berkeley, California, USA
| | - Cynthia Riginos
- School of Biological Sciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Gideon S Bradburd
- Ecology, Evolution, and Behavior Program, Department of Integrative Biology, Michigan State University, East Lansing, Michigan, USA
| | - Robert J Toonen
- Hawai'i Institute of Marine Biology, University of Hawai'i at Mānoa, Kaneohe, Hawaii, USA
| |
Collapse
|
11
|
Hamilton DG, Hong K, Fraser H, Rowhani-Farid A, Fidler F, Page MJ. Prevalence and predictors of data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. BMJ 2023; 382:e075767. [PMID: 37433624 DOI: 10.1136/bmj-2023-075767] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
OBJECTIVES To synthesise research investigating data and code sharing in medicine and health to establish an accurate representation of the prevalence of sharing, how this frequency has changed over time, and what factors influence availability. DESIGN Systematic review with meta-analysis of individual participant data. DATA SOURCES Ovid Medline, Ovid Embase, and the preprint servers medRxiv, bioRxiv, and MetaArXiv were searched from inception to 1 July 2021. Forward citation searches were also performed on 30 August 2022. REVIEW METHODS Meta-research studies that investigated data or code sharing across a sample of scientific articles presenting original medical and health research were identified. Two authors screened records, assessed the risk of bias, and extracted summary data from study reports when individual participant data could not be retrieved. Key outcomes of interest were the prevalence of statements that declared that data or code were publicly or privately available (declared availability) and the success rates of retrieving these products (actual availability). The associations between data and code availability and several factors (eg, journal policy, type of data, trial design, and human participants) were also examined. A two stage approach to meta-analysis of individual participant data was performed, with proportions and risk ratios pooled with the Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis. RESULTS The review included 105 meta-research studies examining 2 121 580 articles across 31 specialties. Eligible studies examined a median of 195 primary articles (interquartile range 113-475), with a median publication year of 2015 (interquartile range 2012-2018). Only eight studies (8%) were classified as having a low risk of bias. Meta-analyses showed a prevalence of declared and actual public data availability of 8% (95% confidence interval 5% to 11%) and 2% (1% to 3%), respectively, between 2016 and 2021. For public code sharing, both the prevalence of declared and actual availability were estimated to be <0.5% since 2016. Meta-regressions indicated that only declared public data sharing prevalence estimates have increased over time. Compliance with mandatory data sharing policies ranged from 0% to 100% across journals and varied by type of data. In contrast, success in privately obtaining data and code from authors historically ranged between 0% and 37% and 0% and 23%, respectively. CONCLUSIONS The review found that public code sharing was persistently low across medical research. Declarations of data sharing were also low, increasing over time, but did not always correspond to actual sharing of data. The effectiveness of mandatory data sharing policies varied substantially by journal and type of data, a finding that might be informative for policy makers when designing policies and allocating resources to audit compliance. SYSTEMATIC REVIEW REGISTRATION Open Science Framework doi:10.17605/OSF.IO/7SX8U.
Collapse
Affiliation(s)
- Daniel G Hamilton
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, VIC, Australia
- Melbourne Medical School, Faculty of Medicine, Dentistry, and Health Sciences, University of Melbourne, Melbourne, VIC, Australia
| | - Kyungwan Hong
- Department of Practice, Sciences, and Health Outcomes Research, University of Maryland School of Pharmacy, Baltimore, MD, USA
| | - Hannah Fraser
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, VIC, Australia
| | - Anisa Rowhani-Farid
- Department of Practice, Sciences, and Health Outcomes Research, University of Maryland School of Pharmacy, Baltimore, MD, USA
| | - Fiona Fidler
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, VIC, Australia
- School of Historical and Philosophical Studies, University of Melbourne, Melbourne, VIC, Australia
| | - Matthew J Page
- Methods in Evidence Synthesis Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
12
|
Chow NLY, Tateishi N, Goldhar A, Zaheer R, Redelmeier DA, Cheung AH, Schaffer A, Sinyor M. Does knowledge have a half-life? An observational study analyzing the use of older citations in medical and scientific publications. BMJ Open 2023; 13:e072374. [PMID: 37217270 DOI: 10.1136/bmjopen-2023-072374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/24/2023] Open
Abstract
OBJECTIVES In the process of scientific progress, prior evidence is both relied on and supplanted by new discoveries. We use the term 'knowledge half-life' to refer to the phenomenon in which older knowledge is discounted in favour of newer research. By quantifying the knowledge half-life, we sought to determine whether research published in more recent years is preferentially cited over older research in medical and scientific articles. DESIGN An observational study employing a directed, systematic search of current literature. DATA SOURCES BMJ, PNAS, JAMA, NEJM, The Annals of Internal Medicine, The Lancet, Science and Nature were searched. ELIGIBILITY CRITERIA FOR SELECTING STUDIES Eight high-impact medical and scientific journals were sampled examining original research articles from the first issue of every year over a 25-year span (1996-2020). The outcome of interest was the difference between the publication year of the article and references cited, termed 'citation lag'. DATA EXTRACTION AND SYNTHESIS Analysis of variance was used to identify significant differences in citation lag. RESULTS A total of 726 articles and 17 895 references were included with a mean citation lag of 7.5±8.4 years. Across all journals, >70% of references had been published within 10 years of the citing article. Approximately 15%-20% of referenced articles were 10-19 years old, and articles more than 20 years old were cited infrequently. Medical journals articles had references with significantly shorter citation lags compared with general science journals (p≤0.01). Articles published before 2009 had references with significantly shorter citation lags compared with those published in 2010-2020 (p<0.001). CONCLUSIONS This study found evidence of a small increase in the citation of older research in medical and scientific literature over the past decade. This phenomenon deserves further characterisation and scrutiny to ensure that 'old knowledge' is not being lost.
Collapse
Affiliation(s)
- Natalie L Y Chow
- Department of Anatomy and Cell Biology, Western University, London, Ontario, Canada
| | - Natalie Tateishi
- Department of Microbiology and Immunology, Western University, London, Ontario, Canada
| | - Alexa Goldhar
- Department of Biology, Queen's University, Kingston, Ontario, Canada
| | - Rabia Zaheer
- Department of Education Services, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Donald A Redelmeier
- Department of Medicine, University of Toronto Faculty of Medicine, Toronto, Ontario, Canada
- Department of Evaluative Clinical Sciences, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Amy H Cheung
- Department of Psychiatry, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| | - Ayal Schaffer
- Department of Psychiatry, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| | - Mark Sinyor
- Department of Psychiatry, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
13
|
Bell SM, Raymond SJ, Yin H, Jiao W, Goll DS, Ciais P, Olivetti E, Leshyk VO, Terrer C. Quantifying the recarbonization of post-agricultural landscapes. Nat Commun 2023; 14:2139. [PMID: 37059844 PMCID: PMC10104816 DOI: 10.1038/s41467-023-37907-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/05/2023] [Indexed: 04/16/2023] Open
Affiliation(s)
- Stephen M Bell
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Laboratoire des Sciences du Climat et de l'Environnement, LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris-Saclay, 91191, Gif-sur-Yvette, France.
- Institute of Environmental Science and Technology, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain.
| | - Samuel J Raymond
- MIT Climate and Sustainability Consortium, Cambridge, MA, 02139, USA
| | - He Yin
- Department of Geography, Kent State University, 325 S. Lincoln Street, Kent, OH, 44242, USA
| | - Wenzhe Jiao
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- MIT Climate and Sustainability Consortium, Cambridge, MA, 02139, USA
| | - Daniel S Goll
- Laboratoire des Sciences du Climat et de l'Environnement, LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris-Saclay, 91191, Gif-sur-Yvette, France
| | - Philippe Ciais
- Laboratoire des Sciences du Climat et de l'Environnement, LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris-Saclay, 91191, Gif-sur-Yvette, France
| | - Elsa Olivetti
- MIT Climate and Sustainability Consortium, Cambridge, MA, 02139, USA
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Victor O Leshyk
- Center for Ecosystem Science and Society, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - César Terrer
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| |
Collapse
|
14
|
The current landscape of author guidelines in chemistry through the lens of research data sharing. PURE APPL CHEM 2023. [DOI: 10.1515/pac-2022-1001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
Abstract
As the primary method of communicating research results, journals garner an enormous impact on community behavior. Publishing the underlying research data alongside journal articles is widely considered good scientific practice. Ideally, journals and their publishers place these recommendations or requirements in their author guidelines and data policies. Several efforts are working to improve the infrastructure, processes, and uptake of research data sharing, including the NFDI4Chem consortium, working groups within the RDA, and IUPAC, including the WorldFAIR Chemistry project. In this article, we present the results of a large-scale analysis of author guidelines from several publishers and journals active in chemistry research, showing how well the publishing
landscape supports different criteria and where there is room for improvement. While the requirement for deposition of X-ray diffraction data is commonplace, guidelines rarely mention machine-readable chemical structures and metadata/minimum information standards. Further evaluation criteria included recommendations on persistent identifiers, data availability statements, data deposition into repositories as well as of open analytical data formats. Our survey shows that publishers and journals are starting to include aspects of research data in their guidelines. We as authors should accept and embrace the guidelines with increasing requirements for data availability, data interoperability, and re-usability to improve chemistry research.
Collapse
|
15
|
Huff M, Bongartz EC. Low Research-Data Availability in Educational-Psychology Journals: No Indication of Effective Research-Data Policies. ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE 2023. [DOI: 10.1177/25152459231156419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
Research-data availability contributes to the transparency of the research process and the credibility of educational-psychology research and science in general. Recently, there have been many initiatives to increase the availability and quality of research data. Many research institutions have adopted research-data policies. This increased awareness might have raised the sharing of research data in empirical articles. To test this idea, we coded 1,242 publications from six educational-psychology journals and the psychological journal Cognition (as a baseline) published in 2018 and 2020. Research-data availability was low (3.85% compared with 62.74% in Cognition) but has increased from 0.32% (2018) to 7.16% (2020). However, neither the data-transparency level of the journal nor the existence of an official research-data policy on the level of the corresponding author’s institution was related to research-data availability. We discuss the consequences of these findings for institutional research-data-management processes.
Collapse
|
16
|
Kosnik MB, Kephalopoulos S, Muñoz A, Aurisano N, Cusinato A, Dimitroulopoulou S, Slobodnik J, De Mello J, Zare Jeddi M, Cascio C, Ahrens A, Bruinen de Bruin Y, Lieck L, Fantke P. Advancing exposure data analytics and repositories as part of the European Exposure Science Strategy 2020-2030. ENVIRONMENT INTERNATIONAL 2022; 170:107610. [PMID: 36356553 DOI: 10.1016/j.envint.2022.107610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 10/28/2022] [Accepted: 10/29/2022] [Indexed: 06/16/2023]
Abstract
High-quality and comprehensive exposure-related data are critical for different decision contexts, including environmental and human health monitoring, and chemicals risk assessment and management. However, exposure-related data are currently scattered, frequently of unclear quality and structure, not readily accessible, and stored in various-partly overlapping-data repositories, leading to inefficient and ineffective data usage in Europe and globally. We propose strategic guidance for an integrated European exposure data production and management framework for use in science and policy, building on current and future data analysis and digitalization trends. We map the existing exposure data landscape to requirements for data analytics and repositories across European policies and regulations. We further identify needs and ways forward for improving data generation, sharing, and usage, and translate identified needs into an operational action plan for European and global advancement of exposure data for policies and regulations. Identified key areas of action are to develop consistent exposure data standards and terminology for data production and reporting, increase data transparency and availability, enhance data storage and related infrastructure, boost automation in data management, increase data integration, and advance tools for innovative data analysis. Improving and streamlining exposure data generation and uptake into science and policy is crucial for the European Chemicals Strategy for Sustainability and European Digital Strategy, in line with EU Data policies on data management and interoperability.
Collapse
Affiliation(s)
- Marissa B Kosnik
- Quantitative Sustainability Assessment, Department of Environmental and Resource Engineering, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | | | - Amalia Muñoz
- European Commission, Joint Research Centre (JRC), Geel, Belgium
| | - Nicolò Aurisano
- Quantitative Sustainability Assessment, Department of Environmental and Resource Engineering, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | | | - Sani Dimitroulopoulou
- Air Quality and Public Health, EHE Dept, UK Health Security Agency, Chilton OX11 0RQ, United Kingdom
| | | | - Jonathas De Mello
- Economy Division, United Nations Environment Programme, 75015 Paris, France
| | - Maryam Zare Jeddi
- National Institute for Public Health and the Environment (RIVM), 3721 MA Bilthoven, the Netherlands
| | | | | | | | - Lothar Lieck
- European Agency for Safety and Health at Work (EU-OSHA), Bilbao, Spain
| | - Peter Fantke
- Quantitative Sustainability Assessment, Department of Environmental and Resource Engineering, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
17
|
Ehlers MR, Lonsdorf TB. Data sharing in experimental fear and anxiety research: From challenges to a dynamically growing database in 10 simple steps. Neurosci Biobehav Rev 2022; 143:104958. [DOI: 10.1016/j.neubiorev.2022.104958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 11/13/2022]
|
18
|
Vallet N, Michonneau D, Tournier S. Toward practical transparent verifiable and long-term reproducible research using Guix. Sci Data 2022; 9:597. [PMID: 36195618 PMCID: PMC9532446 DOI: 10.1038/s41597-022-01720-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 09/26/2022] [Indexed: 11/09/2022] Open
Abstract
Reproducibility crisis urge scientists to promote transparency which allows peers to draw same conclusions after performing identical steps from hypothesis to results. Growing resources are developed to open the access to methods, data and source codes. Still, the computational environment, an interface between data and source code running analyses, is not addressed. Environments are usually described with software and library names associated with version labels or provided as an opaque container image. This is not enough to describe the complexity of the dependencies on which they rely to operate on. We describe this issue and illustrate how open tools like Guix can be used by any scientist to share their environment and allow peers to reproduce it. Some steps of research might not be fully reproducible, but at least, transparency for computation is technically addressable. These tools should be considered by scientists willing to promote transparency and open science.
Collapse
Affiliation(s)
- Nicolas Vallet
- Université de Paris, INSERM U976, F-75010, Paris, France.
| | - David Michonneau
- Université de Paris, INSERM U976, F-75010, Paris, France.,Hematology Transplantation, Saint Louis hospital, 1 avenue Claude Vellefaux, 75010, Paris, France
| | - Simon Tournier
- Université de Paris, INSERM US53, CNRS UAR 2030, Saint Louis Research Institute, 1 avenue Claude Vellefaux, 75010, Paris, France
| |
Collapse
|
19
|
Polgár S, Schofield PN, Madas BG. Datasets of in vitro clonogenic assays showing low dose hyper-radiosensitivity and induced radioresistance. Sci Data 2022; 9:555. [PMID: 36075916 PMCID: PMC9458642 DOI: 10.1038/s41597-022-01653-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 08/19/2022] [Indexed: 11/19/2022] Open
Abstract
Low dose hyper-radiosensitivity and induced radioresistance are primarily observed in surviving fractions of cell populations exposed to ionizing radiation, plotted as the function of absorbed dose. Several biophysical models have been developed to quantitatively describe these phenomena. However, there is a lack of raw, openly available experimental data to support the development and validation of quantitative models. The aim of this study was to set up a database of experimental data from the public literature. Using Google Scholar search, 46 publications with 101 datasets on the dose-dependence of surviving fractions, with clear evidence of low dose hyper-radiosensitivity, were identified. Surviving fractions, their uncertainties, and the corresponding absorbed doses were digitized from graphs of the publications. The characteristics of the cell line and the irradiation were also recorded, along with the parameters of the linear-quadratic model and/or the induced repair model if they were provided. The database is available in STOREDB, and can be used for meta-analysis, for comparison with new experiments, and for development and validation of biophysical models. Measurement(s) | surviving fraction of cells | Technology Type(s) | optical microscopy | Factor Type(s) | absorbed dose | Sample Characteristic - Organism | Homo sapiens • Chinese hamster • Rattus sp. | Sample Characteristic - Environment | cell culture |
Collapse
Affiliation(s)
- Szabolcs Polgár
- Doctoral School of Physics, ELTE Eötvös Loránd University, Budapest, Hungary.,Environmental Physics Department, Centre for Energy Research, Budapest, Hungary
| | - Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom
| | - Balázs G Madas
- Environmental Physics Department, Centre for Energy Research, Budapest, Hungary. .,Department of Physical Chemistry and Materials Science, Budapest University of Technology and Economics, Budapest, Hungary.
| |
Collapse
|
20
|
Federer LM. Long-term availability of data associated with articles in PLOS ONE. PLoS One 2022; 17:e0272845. [PMID: 36001577 PMCID: PMC9401135 DOI: 10.1371/journal.pone.0272845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 07/26/2022] [Indexed: 11/18/2022] Open
Abstract
The adoption of journal policies requiring authors to include a Data Availability Statement has helped to increase the availability of research data associated with research articles. However, having a Data Availability Statement is not a guarantee that readers will be able to locate the data; even if provided with an identifier like a uniform resource locator (URL) or a digital object identifier (DOI), the data may become unavailable due to link rot and content drift. To explore the long-term availability of resources including data, code, and other digital research objects associated with papers, this study extracted 8,503 URLs and DOIs from a corpus of nearly 50,000 Data Availability Statements from papers published in PLOS ONE between 2014 and 2016. These URLs and DOIs were used to attempt to retrieve the data through both automated and manual means. Overall, 80% of the resources could be retrieved automatically, compared to much lower retrieval rates of 10–40% found in previous papers that relied on contacting authors to locate data. Because a URL or DOI might be valid but still not point to the resource, a subset of 350 URLs and 350 DOIs were manually tested, with 78% and 98% of resources, respectively, successfully retrieved. Having a DOI and being shared in a repository were both positively associated with availability. Although resources associated with older papers were slightly less likely to be available, this difference was not statistically significant, suggesting that URLs and DOIs may be an effective means for accessing data over time. These findings point to the value of including URLs and DOIs in Data Availability Statements to ensure access to data on a long-term basis.
Collapse
Affiliation(s)
- Lisa M. Federer
- Office of Strategic Initiatives, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
21
|
Fu P, Blackson M, Valentino M. Developing research data management services in a regional comprehensive university: The case of Central Washington University. IFLA JOURNAL-INTERNATIONAL FEDERATION OF LIBRARY ASSOCIATIONS 2022. [DOI: 10.1177/03400352221116923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This study aims to analyze the needs of researchers in a regional comprehensive university for research data management services; discuss the options for developing a research data management program at the university; and then propose a phased three-year implementation plan for the university libraries. The method was to design a survey to collect information from researchers and assess and evaluate their needs for research data management services. The results show that researchers’ needs in a regional comprehensive university could be quite different from those of researchers in a research-intensive university. Also, the results verify the hypothesis that researchers in the regional comprehensive university would welcome the libraries offering managed data services for the research community. Therefore, this study suggests a phased three-year implementation plan. The significance of the study is that it can give some insights and helpful information for regional comprehensive universities that are planning to develop a research data management program.
Collapse
Affiliation(s)
- Ping Fu
- University Libraries, Central Washington University, USA
| | | | | |
Collapse
|
22
|
Huang CW, Chuang WH, Lin CY, Chen SH. Elegancy: Digitizing the wisdom from laboratories to the cloud with free no-code platform. iScience 2022; 25:104710. [PMID: 35874097 PMCID: PMC9304594 DOI: 10.1016/j.isci.2022.104710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/13/2022] [Accepted: 06/28/2022] [Indexed: 11/25/2022] Open
Abstract
One of the top priorities in any laboratory is archiving experimental data in the most secure, efficient, and errorless way. It is especially important to those in chemical and biological research, for it is more likely to damage experiment records. In addition, the transmission of experiment results from paper to electronic devices is time-consuming and redundant. Therefore, we introduce an open-source no-code electronic laboratory notebook, Elegancy, a cloud-based/standalone web service distributed as a Docker image. Elegancy fits all laboratories but is specially equipped with several features benefitting biochemical laboratories. It can be accessed via various web browsers, allowing researchers to upload photos or audio recordings directly from their mobile devices. Elegancy also contains a meeting arrangement module, audit/revision control, and laboratory supply management system. We believe Elegancy could help the scientific research community gather evidence, share information, reorganize knowledge, and digitize laboratory works with greater ease and security.
Collapse
Affiliation(s)
- Chih-Wei Huang
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Wei-Hsuan Chuang
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Chung-Yen Lin
- Institute of Information Science, Academia Sinica, Taipei, Taiwan.,Institute of Fisheries Science, National Taiwan University, Taipei, Taiwan.,Genome and Systems Biology Degree Program, National Taiwan University, Taipei, Taiwan
| | - Shu-Hwa Chen
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
23
|
Munro LJ, Kell DB. Analysis of a Library of Escherichia coli Transporter Knockout Strains to Identify Transport Pathways of Antibiotics. Antibiotics (Basel) 2022; 11:antibiotics11081129. [PMID: 36009997 PMCID: PMC9405208 DOI: 10.3390/antibiotics11081129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/15/2022] [Accepted: 08/15/2022] [Indexed: 11/16/2022] Open
Abstract
Antibiotic resistance is a major global healthcare issue. Antibiotic compounds cross the bacterial cell membrane via membrane transporters, and a major mechanism of antibiotic resistance is through modification of the membrane transporters to increase the efflux or reduce the influx of antibiotics. Targeting these transporters is a potential avenue to combat antibiotic resistance. In this study, we used an automated screening pipeline to evaluate the growth of a library of 447 Escherichia coli transporter knockout strains exposed to sub-inhibitory concentrations of 18 diverse antimicrobials. We found numerous knockout strains that showed more resistant or sensitive phenotypes to specific antimicrobials, suggestive of transport pathways. We highlight several specific drug-transporter interactions that we identified and provide the full dataset, which will be a useful resource in further research on antimicrobial transport pathways. Overall, we determined that transporters are involved in modulating the efficacy of almost all the antimicrobial compounds tested and can, thus, play a major role in the development of antimicrobial resistance.
Collapse
Affiliation(s)
- Lachlan Jake Munro
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Douglas B. Kell
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
- Correspondence: or
| |
Collapse
|
24
|
Big Data in Laboratory Medicine—FAIR Quality for AI? Diagnostics (Basel) 2022; 12:diagnostics12081923. [PMID: 36010273 PMCID: PMC9406962 DOI: 10.3390/diagnostics12081923] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/05/2022] [Accepted: 08/06/2022] [Indexed: 12/22/2022] Open
Abstract
Laboratory medicine is a digital science. Every large hospital produces a wealth of data each day—from simple numerical results from, e.g., sodium measurements to highly complex output of “-omics” analyses, as well as quality control results and metadata. Processing, connecting, storing, and ordering extensive parts of these individual data requires Big Data techniques. Whereas novel technologies such as artificial intelligence and machine learning have exciting application for the augmentation of laboratory medicine, the Big Data concept remains fundamental for any sophisticated data analysis in large databases. To make laboratory medicine data optimally usable for clinical and research purposes, they need to be FAIR: findable, accessible, interoperable, and reusable. This can be achieved, for example, by automated recording, connection of devices, efficient ETL (Extract, Transform, Load) processes, careful data governance, and modern data security solutions. Enriched with clinical data, laboratory medicine data allow a gain in pathophysiological insights, can improve patient care, or can be used to develop reference intervals for diagnostic purposes. Nevertheless, Big Data in laboratory medicine do not come without challenges: the growing number of analyses and data derived from them is a demanding task to be taken care of. Laboratory medicine experts are and will be needed to drive this development, take an active role in the ongoing digitalization, and provide guidance for their clinical colleagues engaging with the laboratory data in research.
Collapse
|
25
|
Schulz R, Langen G, Prill R, Cassel M, Weissgerber TL. Reporting and transparent research practices in sports medicine and orthopaedic clinical trials: a meta-research study. BMJ Open 2022; 12:e059347. [PMID: 35940834 PMCID: PMC9364413 DOI: 10.1136/bmjopen-2021-059347] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVES Transparent reporting of clinical trials is essential to assess the risk of bias and translate research findings into clinical practice. While existing studies have shown that deficiencies are common, detailed empirical and field-specific data are scarce. Therefore, this study aimed to examine current clinical trial reporting and transparent research practices in sports medicine and orthopaedics. SETTING Exploratory meta-research study on reporting quality and transparent research practices in orthopaedics and sports medicine clinical trials. PARTICIPANTS The sample included clinical trials published in the top 25% of sports medicine and orthopaedics journals over 9 months. PRIMARY AND SECONDARY OUTCOME MEASURES Two independent reviewers assessed pre-registration, open data and criteria related to scientific rigour, like randomisation, blinding, and sample size calculations, as well as the study sample, and data analysis. RESULTS The sample included 163 clinical trials from 27 journals. While the majority of trials mentioned rigour criteria, essential details were often missing. Sixty per cent (95% confidence interval (CI) 53% to 68%) of trials reported sample size calculations, but only 32% (95% CI 25% to 39%) justified the expected effect size. Few trials indicated the blinding status of all main stakeholders (4%; 95% CI 1% to 7%). Only 18% (95% CI 12% to 24%) included information on randomisation type, method and concealed allocation. Most trials reported participants' sex/gender (95%; 95% CI 92% to 98%) and information on inclusion and exclusion criteria (78%; 95% CI 72% to 84%). Only 20% (95% CI 14% to 26%) of trials were pre-registered. No trials deposited data in open repositories. CONCLUSIONS These results will aid the sports medicine and orthopaedics community in developing tailored interventions to improve reporting. While authors typically mention blinding, randomisation and other factors, essential details are often missing. Greater acceptance of open science practices, like pre-registration and open data, is needed. As these practices have been widely encouraged, we discuss systemic interventions that may improve clinical trial reporting.
Collapse
Affiliation(s)
- Robert Schulz
- BIH QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Department of Sport and Health Sciences, University of Potsdam, Potsdam, Brandenburg, Germany
| | - Georg Langen
- Department of Strength, Power and Tactical Sports, Institute for Applied Training Science, Leipzig, Germany
| | - Robert Prill
- Center of Orthopaedics and Traumatology, Brandenburg Medical School Theodor Fontane, Neuruppin, Brandenburg, Germany
| | - Michael Cassel
- Department of Sport and Health Sciences, University of Potsdam, Potsdam, Brandenburg, Germany
| | - Tracey L Weissgerber
- BIH QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
26
|
Ten simple rules for maximizing the recommendations of the NIH data management and sharing plan. PLoS Comput Biol 2022; 18:e1010397. [PMID: 35921268 PMCID: PMC9348704 DOI: 10.1371/journal.pcbi.1010397] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The National Institutes of Health (NIH) Policy for Data Management and Sharing (DMS Policy) recognizes the NIH’s role as a key steward of United States biomedical research and information and seeks to enhance that stewardship through systematic recommendations for the preservation and sharing of research data generated by funded projects. The policy is effective as of January 2023. The recommendations include a requirement for the submission of a Data Management and Sharing Plan (DMSP) with funding applications, and while no strict template was provided, the NIH has released supplemental draft guidance on elements to consider when developing a plan. This article provides 10 key recommendations for creating a DMSP that is both maximally compliant and effective.
Collapse
|
27
|
A Critical Literature Review of Historic Scientific Analog Data: Uses, Successes, and Challenges. DATA SCIENCE JOURNAL 2022. [DOI: 10.5334/dsj-2022-014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
28
|
Bledsoe EK, Burant JB, Higino GT, Roche DG, Binning SA, Finlay K, Pither J, Pollock LS, Sunday JM, Srivastava DS. Data rescue: saving environmental data from extinction. Proc Biol Sci 2022; 289:20220938. [PMID: 35855607 PMCID: PMC9297007 DOI: 10.1098/rspb.2022.0938] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Historical and long-term environmental datasets are imperative to understanding how natural systems respond to our changing world. Although immensely valuable, these data are at risk of being lost unless actively curated and archived in data repositories. The practice of data rescue, which we define as identifying, preserving, and sharing valuable data and associated metadata at risk of loss, is an important means of ensuring the long-term viability and accessibility of such datasets. Improvements in policies and best practices around data management will hopefully limit future need for data rescue; these changes, however, do not apply retroactively. While rescuing data is not new, the term lacks formal definition, is often conflated with other terms (i.e. data reuse), and lacks general recommendations. Here, we outline seven key guidelines for effective rescue of historically collected and unmanaged datasets. We discuss prioritization of datasets to rescue, forming effective data rescue teams, preparing the data and associated metadata, and archiving and sharing the rescued materials. In an era of rapid environmental change, the best policy solutions will require evidence from both contemporary and historical sources. It is, therefore, imperative that we identify and preserve valuable, at-risk environmental data before they are lost to science.
Collapse
Affiliation(s)
- Ellen K. Bledsoe
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,School of Natural Resources and the Environment, University of Arizona, Tucson, AZ, USA,Department of Biology, University of Regina, Regina, Saskatchewan, Canada
| | - Joseph B. Burant
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Department of Biology, McGill University, Montreal, Quebec, Canada,Département de sciences biologiques, Université de Montréal, Montréal, Québec, Canada
| | - Gracielle T. Higino
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
| | - Dominique G. Roche
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Department of Biology and Institute for Environment & Interdisciplinary Science, Carleton University, Ottawa, Ontario, Canada
| | - Sandra A. Binning
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Département de sciences biologiques, Université de Montréal, Montréal, Québec, Canada
| | - Kerri Finlay
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Department of Biology, University of Regina, Regina, Saskatchewan, Canada
| | - Jason Pither
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Department of Biology and Okanagan Institute for Biodiversity, Resilience, and Ecosystem Services, University of British Columbia, Kelowna, British Columbia, Canada
| | - Laura S. Pollock
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Department of Biology, McGill University, Montreal, Quebec, Canada
| | - Jennifer M. Sunday
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Department of Biology, McGill University, Montreal, Quebec, Canada
| | - Diane S. Srivastava
- The Living Data Project, Canadian Institute of Ecology and Evolution, Vancouver, British Columbia, Canada,Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
29
|
Rauh S, Johnson BS, Bowers A, Tritz D, Vassar BM. A review of reproducible and transparent research practices in urology publications from 2014 to2018. BMC Urol 2022; 22:102. [PMID: 35820886 PMCID: PMC9277815 DOI: 10.1186/s12894-022-01059-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 07/07/2022] [Indexed: 11/10/2022] Open
Abstract
Background Reproducibility is essential for the integrity of scientific research. Reproducibility is measured by the ability of different investigators to replicate the outcomes of an original publication using the same materials and procedures. Unfortunately, reproducibility is not currently a standard being met by most scientific research. Methods For this review, we sampled 300 publications in the field of urology to assess for 14 indicators of reproducibility including material availability, raw data availability, analysis script availability, pre-registration information, links to protocols, and if the publication was available free to the public. Publications were also assessed for statements about conflicts of interest and funding sources. Results Of the 300 sample publications, 171 contained empirical data available for analysis of reproducibility. Of the 171 articles with empirical data to analyze, 0.58% provided links to protocols, 4.09% provided access to raw data, 3.09% provided access to materials, and 4.68% were pre-registered. None of the studies provided analysis scripts. Our review is cross-sectional in nature, including only PubMed indexed journals-published in English-and within a finite time period. Thus, our results should be interpreted in light of these considerations. Conclusion Current urology research does not consistently provide the components needed to reproduce original studies. Collaborative efforts from investigators and journal editors are needed to improve research quality while minimizing waste and patient risk. Supplementary Information The online version contains supplementary material available at 10.1186/s12894-022-01059-8.
Collapse
Affiliation(s)
- Shelby Rauh
- Oklahoma State University Center for Health Sciences, 1111 W 17th St., Tulsa, OK, 74107, USA.
| | - Bradley S Johnson
- Oklahoma State University Center for Health Sciences, 1111 W 17th St., Tulsa, OK, 74107, USA
| | - Aaron Bowers
- Oklahoma State University Center for Health Sciences, 1111 W 17th St., Tulsa, OK, 74107, USA
| | - Daniel Tritz
- Oklahoma State University Center for Health Sciences, 1111 W 17th St., Tulsa, OK, 74107, USA
| | - Benjamin Matthew Vassar
- Oklahoma State University Center for Health Sciences, 1111 W 17th St., Tulsa, OK, 74107, USA
| |
Collapse
|
30
|
Arend D, Psaroudakis D, Memon JA, Rey-Mazón E, Schüler D, Szymanski JJ, Scholz U, Junker A, Lange M. From data to knowledge - big data needs stewardship, a plant phenomics perspective. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:335-347. [PMID: 35535481 DOI: 10.1111/tpj.15804] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 06/14/2023]
Abstract
The research data life cycle from project planning to data publishing is an integral part of current research. Until the last decade, researchers were responsible for all associated phases in addition to the actual research and were assisted only at certain points by IT or bioinformaticians. Starting with advances in sequencing, the automation of analytical methods in all life science fields, including in plant phenotyping, has led to ever-increasing amounts of ever more complex data. The tasks associated with these challenges now often exceed the expertise of and infrastructure available to scientists, leading to an increased risk of data loss over time. The IPK Gatersleben has one of the world's largest germplasm collections and two decades of experience in crop plant research data management. In this article we show how challenges in modern, data-driven research can be addressed by data stewards. Based on concrete use cases, data management processes and best practices from plant phenotyping, we describe which expertise and skills are required and how data stewards as an integral actor can enhance the quality of a necessary digital transformation in progressive research.
Collapse
Affiliation(s)
- Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Dennis Psaroudakis
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Junaid Altaf Memon
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Elena Rey-Mazón
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Jedrzej Jakub Szymanski
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| |
Collapse
|
31
|
Herrero J, Castañeda C. A dataset of aerial photographs of 1972 from an irrigated area in Monegros, Spain. Data Brief 2022; 42:108325. [PMID: 35677461 PMCID: PMC9168529 DOI: 10.1016/j.dib.2022.108325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/17/2022] [Accepted: 05/24/2022] [Indexed: 11/17/2022] Open
Abstract
Our dataset contains the scans of 278 paper prints of contacts from a photogrammetric flight of 1972, plus a diagram for the relative location of each of the photograms. The paper prints served three years later, i.e. in 1975, for studying the soils of an irrigated district. The entire flight covered about 705 km2, in the semiarid Central Ebro Basin, at the province of Huesca, Spain. The flight encompasses the 359 km2-irrigated district fed by the sections 2nd and 3rd of the first part of the Canal of Monegros, plus the westward conterminous non-irrigated lands until the border with the province of Zaragoza (Fig. 1). The Spanish Ministry of Agriculture throughout its now extinct branch Institute for Agrarian Reform and Development, i.e. IRYDA by its Spanish acronym, contracted a consulting company to produce a report [1] about the location of saline and non-saline soils of the district in 1975. The soil surveyors used the paper prints for preparing the report and marked some of the prints with color wax-pencil. Most of these marks locate the sampled sites but also appear geographical names, schematic highlights of some terrain features, and mentions to ongoing land levelling or similar works.
Collapse
|
32
|
Vazquez P, Hirayama-Shoji K, Novik S, Krauss S, Rayner S. Globally Accessible Distributed Data Sharing (GADDS): a decentralized FAIR platform to facilitate data sharing in the life sciences. Bioinformatics 2022; 38:3812-3817. [PMID: 35639939 PMCID: PMC9344842 DOI: 10.1093/bioinformatics/btac362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 04/12/2022] [Accepted: 05/24/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Technical advances have revolutionized the life sciences and researchers commonly face challenges associated with handling large amounts of heterogeneous digital data. The Findable, Accessible, Interoperable and Reusable (FAIR) principles provide a framework to support effective data management. However, implementing this framework is beyond the means of most researchers in terms of resources and expertise, requiring awareness of metadata, policies, community agreements, and other factors such as vocabularies and ontologies. RESULTS We have developed the Globally Accessible Distributed Data Sharing (GADDS) platform to facilitate FAIR-like data-sharing in cross-disciplinary research collaborations. The platform consists of (i) a blockchain based metadata quality control system, (ii) a private cloud-like storage system and (iii) a version control system. GADDS is built with containerized technologies, providing minimal hardware standards and easing scalability, and offers decentralized trust via transparency of metadata, facilitating data exchange and collaboration. As a use case, we provide an example implementation in engineered living material technology within the Hybrid Technology Hub at the University of Oslo. AVAILABILITY Demo version available at https://github.com/pavelvazquez/GADDS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pavel Vazquez
- Hybrid Technology Hub - Centre of Excellence, Institute of Basic Medical Sciences, University of Oslo, P.O. Box 1110 Blindern 0317, Oslo, Norway
| | - Kayoko Hirayama-Shoji
- Hybrid Technology Hub - Centre of Excellence, Institute of Basic Medical Sciences, University of Oslo, P.O. Box 1110 Blindern 0317, Oslo, Norway
| | - Steffen Novik
- Department of Informatics, Faculty of Mathematics and Natural Sciences, University of Oslo, P.O. Box 1032 Blindern N-0315, Oslo, Norway
| | - Stefan Krauss
- Hybrid Technology Hub - Centre of Excellence, Institute of Basic Medical Sciences, University of Oslo, P.O. Box 1110 Blindern 0317, Oslo, Norway.,Department of Immunology and Transfusion Medicine, Oslo University Hospital, P.O. Box 4950 Nydalen, 0424, Oslo, Norway
| | - Simon Rayner
- Hybrid Technology Hub - Centre of Excellence, Institute of Basic Medical Sciences, University of Oslo, P.O. Box 1110 Blindern 0317, Oslo, Norway.,Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| |
Collapse
|
33
|
Zong W, Lin S, Gao Y, Yan Y. Process-driven quality improvement for scientific data based on information product map. ELECTRONIC LIBRARY 2022. [DOI: 10.1108/el-08-2021-0157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
This paper aims to provide a process-driven scientific data quality (DQ) monitoring framework by information product map (IP-Map) in identifying the root causes of poor DQ issues so as to assure the quality of scientific data.
Design/methodology/approach
First, a general scientific data life cycle model is constructed based on eight classical models and 37 researchers’ experience. Then, the IP-Map is constructed to visualize the scientific data manufacturing process. After that, the potential deficiencies that may arise and DQ issues are examined from the aspects of process and data stakeholders. Finally, the corresponding strategies for improving scientific DQ are put forward.
Findings
The scientific data manufacturing process and data stakeholders’ responsibilities could be clearly visualized by the IP-Map. The proposed process-driven framework is helpful in clarifying the root causes of DQ vulnerabilities in scientific data.
Research limitations/implications
As for the implications for researchers, the process-driven framework proposed in this paper provides a better understanding of scientific DQ issues during implementing a research project as well as providing a useful method to analyse those DQ issues based on IP-Map approach from the aspects of process and data stakeholders.
Practical implications
The process-driven framework is beneficial for the research institutions, scientific data management centres and researchers to better manage the scientific data manufacturing process and solve the scientific DQ issues.
Originality/value
This research proposes a general scientific data life cycle model and further provides a process-driven scientific DQ monitoring framework for identifying the root causes of poor data issues from the aspects of process and stakeholders which have been ignored by existing information technology-driven solutions. This study is likely to lead to an improved approach to assuring the scientific DQ and is applicable in different research fields.
Collapse
|
34
|
|
35
|
Boyd C. Data as assemblage. JOURNAL OF DOCUMENTATION 2022. [DOI: 10.1108/jd-08-2021-0159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeA definition of data called data as assemblage is presented. The definition accommodates different forms and meanings of data; emphasizes data subjects and data workers; and reflects the sociotechnical aspects of data throughout its lifecycle of creation and use. A scalable assemblage model describing the anatomy and behavior of data, datasets and data infrastructures is also introduced.Design/methodology/approachData as assemblage is compared to common meanings of data. The assemblage model's elements and relationships also are defined, mapped to the anatomy of a US Census dataset and used to describe the structure of research data repositories.FindingsReplacing common data definitions with data as assemblage enriches information science and research data management (RDM) frameworks. Also, the assemblage model is shown to describe datasets and data infrastructures despite their differences in scale, composition and outward appearance.Originality/valueData as assemblage contributes a definition of data as mutable, portable, sociotechnical arrangements of material and symbolic components that serve as evidence. The definition is useful in information science and research data management contexts. The assemblage model contributes a scale-independent way to describe the structure and behavior of data, datasets and data infrastructures and supports analyses and comparisons involving them.
Collapse
|
36
|
Abstract
Recent years have introduced major shifts in scientific reporting and publishing. The scientific community, publishers, funding agencies, and the public expect research to adhere to principles of openness, reproducibility, replicability, and repeatability. However, studies have shown that scientists often have neither the right tools nor suitable support at their disposal to meet these modern science challenges. In fact, even the concrete expectations connected to these terms may be unclear and subject to field-specific, organizational, and personal interpretations. Based on a narrative literature review of work that defines characteristics of open science, reproducibility, replicability, and repeatability, as well as a review of recent work on researcher-centered requirements, we find that the bottom-up practices and needs of researchers contrast top-down expectations encoded in terms related to reproducibility and open science. We identify and define reproducibility as a central term that concerns the ease of access to scientific resources, as well as their completeness, to the degree required for efficiently and effectively interacting with scientific work. We hope that this characterization helps to create a mutual understanding across science stakeholders, in turn paving the way for suitable and stimulating environments, fit to address the challenges of modern science reporting and publishing.
Collapse
|
37
|
|
38
|
Sumner JQ, Vitale CH, McIntosh LD. RipetaScore: Measuring the Quality, Transparency, and Trustworthiness of a Scientific Work. Front Res Metr Anal 2022; 6:751734. [PMID: 35128302 PMCID: PMC8814593 DOI: 10.3389/frma.2021.751734] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 12/21/2021] [Indexed: 11/24/2022] Open
Abstract
A wide array of existing metrics quantifies a scientific paper's prominence or the author's prestige. Many who use these metrics make assumptions that higher citation counts or more public attention must indicate more reliable, better quality science. While current metrics offer valuable insight into scientific publications, they are an inadequate proxy for measuring the quality, transparency, and trustworthiness of published research. Three essential elements to establishing trust in a work include: trust in the paper, trust in the author, and trust in the data. To address these elements in a systematic and automated way, we propose the ripetaScore as a direct measurement of a paper's research practices, professionalism, and reproducibility. Using a sample of our current corpus of academic papers, we demonstrate the ripetaScore's efficacy in determining the quality, transparency, and trustworthiness of an academic work. In this paper, we aim to provide a metric to evaluate scientific reporting quality in terms of transparency and trustworthiness of the research, professionalism, and reproducibility.
Collapse
Affiliation(s)
- Josh Q. Sumner
- Ripeta, LLC, St. Louis, MI, United States
- Washington University School of Medicine, Donald Danforth Plant Science Center, Washington University in St. Louis, St. Louis, MI, United States
- *Correspondence: Josh Q. Sumner
| | - Cynthia Hudson Vitale
- Ripeta, LLC, St. Louis, MI, United States
- Association of Research Libraries, Washington, DC, United States
| | | |
Collapse
|
39
|
Zadissa A, Apweiler R. Data Mining, Quality and Management in the Life Sciences. Methods Mol Biol 2022; 2449:3-25. [PMID: 35507257 DOI: 10.1007/978-1-0716-2095-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
With the evermore emphasis put on open science and its invaluable benefits to the scientific community, it is no longer the case where a research project simply ends with a scientific publication. The benefits of data sharing and reproducibility of results have taken the centerpiece within the life science research supported by FAIR principles that firmly underline the importance of open data. The current data-intensive multidisciplinary research has also highlighted the significance of how data is mined and managed. Here we describe some of the features adopted by EMBL-EBI data resources to support data mining, data quality, and data management. We also highlight how EMBL-EBI has responded to the current pandemic through its data resources.
Collapse
Affiliation(s)
- Amonida Zadissa
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
| | - Rolf Apweiler
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| |
Collapse
|
40
|
Rhee SY, Kassaye SG, Jordan MR, Kouamou V, Katzenstein D, Shafer RW. Public availability of HIV-1 drug resistance sequence and treatment data: a systematic review. THE LANCET MICROBE 2022; 3:e392-e398. [PMID: 35544100 PMCID: PMC9095989 DOI: 10.1016/s2666-5247(21)00250-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 08/12/2021] [Accepted: 09/06/2021] [Indexed: 11/28/2022] Open
Abstract
HIV-1 pol sequences from antiretroviral therapy (ART)-naive and ART-experienced people living with HIV-1 are fundamental to understanding the genetic correlates and epidemiology of HIV-1 drug resistance (HIVDR). To assess the public availability of HIV-1 pol sequences and ART histories of the individuals from whom sequenced viruses were obtained, we performed a systematic review of PubMed and GenBank for HIVDR studies published between 2010 and 2019 that reported HIV-1 pol sequences. 934 studies met inclusion criteria, including 461 studies of ART-naive adults, 407 of ART-experienced adults, and 66 of ART-naive and ART-experienced children. Sequences were available for 317 (68·8%) studies of ART-naive individuals, 190 (46·7%) of ART-experienced individuals, and 45 (68·2%) of children. Among ART-experienced individuals, sequences plus linked ART histories were available for 82 (20·1%) studies. Sequences were available for 21 (29·2%) of 72 clinical trials. Among journals publishing more than ten studies, the proportion with available sequences ranged from 8·3% to 86·9%. Strengthened implementation of data sharing policies is required to increase the number of studies with available HIVDR data to support the enterprise of global ART in the face of emerging HIVDR. the Lancet Group takes a neutral position with respect to territorial claims in published maps and institutional affiliations.
Collapse
Affiliation(s)
- Soo-Yon Rhee
- Department of Medicine, Stanford University, Stanford, CA, USA.
| | - Seble G Kassaye
- Department of Medicine, Georgetown University, Washington, DC, USA
| | - Michael R Jordan
- Levy Center for Integrated Management of Antimicrobial Resistance, Tufts University, Boston, MA, USA; Division of Geographic Medicine and Infectious Diseases, Tufts Medical Center, Boston, MA, USA
| | - Vinie Kouamou
- Unit of Medicine, Faculty of Medicine and Health Sciences, University of Zimbabwe, Harare, Zimbabwe
| | - David Katzenstein
- Department of Molecular Biology, Biomedical Research and Training Institute, Harare, Zimbabwe
| | - Robert W Shafer
- Department of Medicine, Stanford University, Stanford, CA, USA
| |
Collapse
|
41
|
Rauh S, Bowers A, Rorah D, Tritz D, Pate H, Frye L, Vassar M. Evaluating the reproducibility of research in obstetrics and gynecology. Eur J Obstet Gynecol Reprod Biol 2021; 269:24-29. [PMID: 34954422 DOI: 10.1016/j.ejogrb.2021.12.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 11/19/2021] [Accepted: 12/11/2021] [Indexed: 01/21/2023]
Abstract
OBJECTIVE Reproducibility is a core tenet of scientific research. A reproducible study is one where the results can be recreated by using the same methodology and materials as the original researchers. Unfortunately, reproducibility is not a standard to which the majority of research is currently adherent. METHODS Our cross-sectional survey evaluated 300 trials in the field of Obstetrics and Gynecology. Our primary objective was to identify nine indicators of reproducibility and transparency. These indicators include availability of data, analysis scripts, pre-registration information, study protocols, funding source, conflict of interest statements and whether or not the study was available via Open Access. RESULTS Of the 300 trials in our sample, 208 contained empirical data that could be assessed for reproducibility. None of the trials in our sample provided a link to their protocols or provided a statement on availability of materials. None were replication studies. Just 10.58% provided a statement regarding their data availability, while only 5.82% provided a statement on preregistration. 25.85% failed to report the presence or absence of conflicts of interest and 54.08% did not state the origin of their funding. CONCLUSION In the studies we examined, research in the field of Obstetrics and Gynecology is not consistently reproducible and frequently lacks conflict of interest disclosure. Consequences of this could be far-reaching and include increased research waste, widespread acceptance of misleading results and erroneous conclusions guiding clinical decision-making.
Collapse
Affiliation(s)
- Shelby Rauh
- Oklahoma State University Center for Health Sciences, Tulsa, OK, United States.
| | - Aaron Bowers
- Oklahoma State University Center for Health Sciences, Tulsa, OK, United States
| | - Drayton Rorah
- Kansas City University of Medicine and Biosciences, Joplin, MO, United States
| | - Daniel Tritz
- Oklahoma State University Center for Health Sciences, Tulsa, OK, United States
| | - Heather Pate
- Department of Obstetrics and Gynecology, Oklahoma State University Medical Center, Tulsa, OK, United States
| | - Lance Frye
- Department of Obstetrics and Gynecology, Oklahoma State University Medical Center, Tulsa, OK, United States
| | - Matt Vassar
- Oklahoma State University Center for Health Sciences, Tulsa, OK, United States
| |
Collapse
|
42
|
Errington TM, Mathur M, Soderberg CK, Denis A, Perfito N, Iorns E, Nosek BA. Investigating the replicability of preclinical cancer biology. eLife 2021; 10:e71601. [PMID: 34874005 PMCID: PMC8651293 DOI: 10.7554/elife.71601] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 10/16/2021] [Indexed: 12/18/2022] Open
Abstract
Replicability is an important feature of scientific research, but aspects of contemporary research culture, such as an emphasis on novelty, can make replicability seem less important than it should be. The Reproducibility Project: Cancer Biology was set up to provide evidence about the replicability of preclinical research in cancer biology by repeating selected experiments from high-impact papers. A total of 50 experiments from 23 papers were repeated, generating data about the replicability of a total of 158 effects. Most of the original effects were positive effects (136), with the rest being null effects (22). A majority of the original effect sizes were reported as numerical values (117), with the rest being reported as representative images (41). We employed seven methods to assess replicability, and some of these methods were not suitable for all the effects in our sample. One method compared effect sizes: for positive effects, the median effect size in the replications was 85% smaller than the median effect size in the original experiments, and 92% of replication effect sizes were smaller than the original. The other methods were binary - the replication was either a success or a failure - and five of these methods could be used to assess both positive and null effects when effect sizes were reported as numerical values. For positive effects, 40% of replications (39/97) succeeded according to three or more of these five methods, and for null effects 80% of replications (12/15) were successful on this basis; combining positive and null effects, the success rate was 46% (51/112). A successful replication does not definitively confirm an original finding or its theoretical interpretation. Equally, a failure to replicate does not disconfirm a finding, but it does suggest that additional investigation is needed to establish its reliability.
Collapse
Affiliation(s)
| | - Maya Mathur
- Quantitative Sciences Unit, Stanford UniversityStanfordUnited States
| | | | | | | | | | - Brian A Nosek
- Center for Open ScienceCharlottesvilleUnited States
- University of VirginiaCharlottesvilleUnited States
| |
Collapse
|
43
|
Leipold MD, Olsen LR. A literature study and public survey on mass cytometry dataset release and reuse. Cytometry A 2021; 101:109-113. [PMID: 34757690 DOI: 10.1002/cyto.a.24512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 10/07/2021] [Accepted: 10/19/2021] [Indexed: 11/11/2022]
Affiliation(s)
- Michael D Leipold
- Human Immune Monitoring Center, Stanford University, Stanford, California, USA
| | - Lars Rønn Olsen
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
44
|
Mandeville CP, Koch W, Nilsen EB, Finstad AG. Open Data Practices among Users of Primary Biodiversity Data. Bioscience 2021; 71:1128-1147. [PMID: 34733117 PMCID: PMC8560312 DOI: 10.1093/biosci/biab072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Presence-only biodiversity data are increasingly relied on in biodiversity, ecology, and conservation research, driven by growing digital infrastructures that support open data sharing and reuse. Recent reviews of open biodiversity data have clearly documented the value of data sharing, but the extent to which the biodiversity research community has adopted open data practices remains unclear. We address this question by reviewing applications of presence-only primary biodiversity data, drawn from a variety of sources beyond open databases, in the indexed literature. We characterize how frequently researchers access open data relative to data from other sources, how often they share newly generated or collated data, and trends in metadata documentation and data citation. Our results indicate that biodiversity research commonly relies on presence-only data that are not openly available and neglects to make such data available. Improved data sharing and documentation will increase the value, reusability, and reproducibility of biodiversity research.
Collapse
Affiliation(s)
- Caitlin P Mandeville
- Department of Natural History, Norwegian University of Science and Technology, Trondheim, Norway
| | - Wouter Koch
- Department of Natural History, Norwegian University of Science and Technology, Trondheim, Norway
| | - Erlend B Nilsen
- Faculty of Biosciences and Aquaculture, Nord University, Steinkjer, Norway
| | - Anders G Finstad
- Department of Natural History, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
45
|
Hood ASC, Sutherland WJ. The data-index: An author-level metric that values impactful data and incentivizes data sharing. Ecol Evol 2021; 11:14344-14350. [PMID: 34765110 PMCID: PMC8571609 DOI: 10.1002/ece3.8126] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 08/04/2021] [Accepted: 08/24/2021] [Indexed: 11/08/2022] Open
Abstract
Author-level metrics are a widely used measure of scientific success. The h-index and its variants measure publication output (number of publications) and research impact (number of citations). They are often used to influence decisions, such as allocating funding or jobs. Here, we argue that the emphasis on publication output and impact hinders scientific progress in the fields of ecology and evolution because it disincentivizes two fundamental practices: generating impactful (and therefore often long-term) datasets and sharing data. We describe a new author-level metric, the data-index, which values both dataset output (number of datasets) and impact (number of data-index citations), so promotes generating and sharing data as a result. We discuss how it could be implemented and provide user guidelines. The data-index is designed to complement other metrics of scientific success, as scientific contributions are diverse and our value system should reflect that both for the benefit of scientific progress and to create a value system that is more equitable, diverse, and inclusive. Future work should focus on promoting other scientific contributions, such as communicating science, informing policy, mentoring other scientists, and providing open-access code and tools.
Collapse
Affiliation(s)
- Amelia S. C. Hood
- Conservation Science Group, Department of ZoologyUniversity of CambridgeCambridgeUK
| | - William J. Sutherland
- Conservation Science Group, Department of ZoologyUniversity of CambridgeCambridgeUK
- Biosecurity Research Initiative at St Catharine's (BioRISC), St Catharine's CollegeUniversity of CambridgeCambridgeUK
| |
Collapse
|
46
|
Saraswati K, Maguire BJ, McLean ARD, Singh-Phulgenda S, Ngu RC, Newton PN, Day NPJ, Guérin PJ. Systematic review of the scrub typhus treatment landscape: Assessing the feasibility of an individual participant-level data (IPD) platform. PLoS Negl Trop Dis 2021; 15:e0009858. [PMID: 34648517 PMCID: PMC8547739 DOI: 10.1371/journal.pntd.0009858] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 10/26/2021] [Accepted: 09/28/2021] [Indexed: 01/18/2023] Open
Abstract
Background Scrub typhus is an acute febrile illness caused by intracellular bacteria from the genus Orientia. It is estimated that one billion people are at risk, with one million cases annually mainly affecting rural areas in Asia-Oceania. Relative to its burden, scrub typhus is understudied, and treatment recommendations vary with poor evidence base. These knowledge gaps could be addressed by establishing an individual participant-level data (IPD) platform, which would enable pooled, more detailed and statistically powered analyses to be conducted. This study aims to assess the characteristics of scrub typhus treatment studies and explore the feasibility and potential value of developing a scrub typhus IPD platform to address unanswered research questions. Methodology/principal findings We conducted a systematic literature review looking for prospective scrub typhus clinical treatment studies published from 1998 to 2020. Six electronic databases (Ovid Embase, Ovid Medline, Ovid Global Health, Cochrane Library, Scopus, Global Index Medicus), ClinicalTrials.gov, and WHO ICTRP were searched. We extracted data on study design, treatment tested, patient characteristics, diagnostic methods, geographical location, outcome measures, and statistical methodology. Among 3,100 articles screened, 127 were included in the analysis. 12,079 participants from 12 countries were enrolled in the identified studies. ELISA, PCR, and eschar presence were the most commonly used diagnostic methods. Doxycycline, azithromycin, and chloramphenicol were the most commonly administered antibiotics. Mortality, complications, adverse events, and clinical response were assessed in most studies. There was substantial heterogeneity in the diagnostic methods used, treatment administered (including dosing and duration), and outcome assessed across studies. There were few interventional studies and limited data collected on specific groups such as children and pregnant women. Conclusions/significance There were a limited number of interventional trials, highlighting that scrub typhus remains a neglected disease. The heterogeneous nature of the available data reflects the absence of consensus in treatment and research methodologies and poses a significant barrier to aggregating information across available published data without access to the underlying IPD. There is likely to be a substantial amount of data available to address knowledge gaps. Therefore, there is value for an IPD platform that will facilitate pooling and harmonisation of currently scattered data and enable in-depth investigation of priority research questions that can, ultimately, inform clinical practice and improve health outcomes for scrub typhus patients. Scrub typhus is a febrile illness most commonly found in rural tropical areas. It is caused by a Gram-negative bacteria belonging to the family Rickettsiaceae and transmitted by mites when they feed on vertebrates. There is an estimate of one million cases annually, with an estimated one billion people at risk, mostly in Asia-Oceania. But relative to the scale of the problem, scrub typhus is largely understudied. Evidence-based treatment recommendations by policymakers vary or are non-existent. We searched databases and registries for prospective scrub typhus clinical treatment studies published from 1998 to 2020 and reviewed them. Data from clinical trials and particularly for specific groups, such as pregnant women and children, were minimal. The methods used to measure treatment efficacy were heterogeneous, making it difficult to directly compare or conduct a meta-analysis based on aggregated data. One way to improve the current level of evidence would be by pooling and analysing individual participant-level data (IPD), i.e. the raw data from individual participants in completed studies. This review demonstrated that there is scope for developing a database for individual participant data to enable more detailed analyses. IPD meta-analyses could be a way to address knowledge gaps such as optimum dosing for children and pregnant women.
Collapse
Affiliation(s)
- Kartika Saraswati
- Eijkman-Oxford Clinical Research Unit, Eijkman Institute for Molecular Biology, Jakarta, Indonesia
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Infectious Diseases Data Observatory (IDDO), Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- * E-mail: (KS); (PJG)
| | - Brittany J. Maguire
- Infectious Diseases Data Observatory (IDDO), Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Alistair R. D. McLean
- Infectious Diseases Data Observatory (IDDO), Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Sauman Singh-Phulgenda
- Infectious Diseases Data Observatory (IDDO), Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Roland C. Ngu
- Infectious Diseases Data Observatory (IDDO), Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Paul N. Newton
- Infectious Diseases Data Observatory (IDDO), Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- Lao-Oxford-Mahosot-Wellcome Trust Research Unit, Microbiology Laboratory, Mahosot Hospital, Vientiane, Lao People’s Democratic Republic
| | - Nicholas P. J. Day
- Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Philippe J. Guérin
- Infectious Diseases Data Observatory (IDDO), Oxford, United Kingdom
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- * E-mail: (KS); (PJG)
| |
Collapse
|
47
|
Way GP, Greene CS, Carninci P, Carvalho BS, de Hoon M, Finley SD, Gosline SJC, Lȇ Cao KA, Lee JSH, Marchionni L, Robine N, Sindi SS, Theis FJ, Yang JYH, Carpenter AE, Fertig EJ. A field guide to cultivating computational biology. PLoS Biol 2021; 19:e3001419. [PMID: 34618807 PMCID: PMC8525744 DOI: 10.1371/journal.pbio.3001419] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 10/19/2021] [Indexed: 11/18/2022] Open
Abstract
Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.
Collapse
Affiliation(s)
- Gregory P. Way
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Casey S. Greene
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences Yokohama, Kanagawa, Japan
- Human Technopole, Milan, Italy
| | - Benilton S. Carvalho
- Department of Statistics, Institute of Mathematics, Statistics and Scientific Computing, University of Campinas, Campinas, Brazil
| | - Michiel de Hoon
- RIKEN Center for Integrative Medical Sciences Yokohama, Kanagawa, Japan
| | - Stacey D. Finley
- Department of Biomedical Engineering, Quantitative and Computational Biology, and Chemical Engineering & Materials Science, University of Southern California, Los Angeles, California, United States of America
| | - Sara J. C. Gosline
- Pacific Northwest National Laboratory, Seattle, Washington, United States of America
| | - Kim-Anh Lȇ Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| | - Jerry S. H. Lee
- Ellison Institute and Departments of Medicine/Oncology, Chemical Engineering, and Material Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill-Cornell Medicine, New York, New York, United States of America
| | - Nicolas Robine
- Computational Biology Lab, New York Genome Center, New York, New York, United States of America
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California Merced, Merced, California, United States of America
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Center Munich and Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Jean Y. H. Yang
- Charles Perkins Centre and School of Mathematics and Statistics, The University of Sydney, Australia
| | - Anne E. Carpenter
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Elana J. Fertig
- Convergence Institute, Departments of Oncology, Biomedical Engineering, and Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
48
|
Soska KC, Xu M, Gonzalez SL, Herzberg O, Tamis-LeMonda CS, Gilmore RO, Adolph KE. (Hyper)active Data Curation: A Video Case Study from Behavioral Science. JOURNAL OF ESCIENCE LIBRARIANSHIP 2021; 10. [PMID: 34532153 DOI: 10.7191/jeslib.2021.1208] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation-where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt "hyperactive" data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary-a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process.
Collapse
|
49
|
Minocher R, Atmaca S, Bavero C, McElreath R, Beheim B. Estimating the reproducibility of social learning research published between 1955 and 2018. ROYAL SOCIETY OPEN SCIENCE 2021; 8:210450. [PMID: 34540248 PMCID: PMC8441137 DOI: 10.1098/rsos.210450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 08/24/2021] [Indexed: 06/13/2023]
Abstract
Reproducibility is integral to science, but difficult to achieve. Previous research has quantified low rates of data availability and results reproducibility across the biological and behavioural sciences. Here, we surveyed 560 empirical publications, published between 1955 and 2018 in the social learning literature, a research topic that spans animal behaviour, behavioural ecology, cultural evolution and evolutionary psychology. Data were recoverable online or through direct data requests for 30% of this sample. Data recovery declines exponentially with time since publication, halving every 6 years, and up to every 9 years for human experimental data. When data for a publication can be recovered, we estimate a high probability of subsequent data usability (87%), analytical clarity (97%) and agreement of published results with reproduced findings (96%). This corresponds to an overall rate of recovering data and reproducing results of 23%, largely driven by the unavailability or incompleteness of data. We thus outline clear measures to improve the reproducibility of research on the ecology and evolution of social behaviour.
Collapse
Affiliation(s)
- Riana Minocher
- Department of Human Behaviour, Ecology and Culture, Max-Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Silke Atmaca
- Department of Human Behaviour, Ecology and Culture, Max-Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Claudia Bavero
- Department of Human Behaviour, Ecology and Culture, Max-Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Richard McElreath
- Department of Human Behaviour, Ecology and Culture, Max-Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Bret Beheim
- Department of Human Behaviour, Ecology and Culture, Max-Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
50
|
Campos MLM, Silva E, Cerceau R, da Cruz SMS, Silva FAB, Gouveia FC, Jardim R, Kotowski N, Lopes GR, Dávila AMR. Towards Machine-Readable (Meta) Data and the FAIR Value for Artificial Intelligence Exploration of COVID-19 and Cancer Research Data. Front Big Data 2021; 4:656553. [PMID: 34527943 PMCID: PMC8437372 DOI: 10.3389/fdata.2021.656553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 07/14/2021] [Indexed: 11/24/2022] Open
Affiliation(s)
- Maria Luiza. M. Campos
- Instituto de Computação, Universidade Federal do Rio de Janeiro, UFRJ, Rio de Janeiro, Brazil
| | - Eugênio Silva
- Unidade de Computação (Ucomp), Centro Universitario Estadual da Zona Oeste (UEZO), Rio de Janeiro, Brazil
| | - Renato Cerceau
- Instituto Nacional de Cardiologia, INC, Rio de Janeiro, Brazil
- Universidade do Estado do Rio de Janeiro, UERJ, Rio de Janeiro, Brazil
| | - Sérgio Manuel Serra da Cruz
- Instituto de Computação, Universidade Federal do Rio de Janeiro, UFRJ, Rio de Janeiro, Brazil
- Departamento de Ciências da Computação, Universidade Federal Rural do Rio de Janeiro, UFRRJ, Seropédica, Brazil
| | | | | | - Rodrigo Jardim
- Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, FIOCRUZ, Rio de Janeiro, Brazil
| | - Nelson Kotowski
- Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, FIOCRUZ, Rio de Janeiro, Brazil
| | - Giseli Rabello Lopes
- Instituto de Computação, Universidade Federal do Rio de Janeiro, UFRJ, Rio de Janeiro, Brazil
| | - Alberto. M. R. Dávila
- Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, FIOCRUZ, Rio de Janeiro, Brazil
| |
Collapse
|