1
|
González-Cebrián A, Bradford M, Chis AE, González-Vélez H. Standardised Versioning of Datasets: a FAIR-compliant Proposal. Sci Data 2024; 11:358. [PMID: 38594314 PMCID: PMC11003959 DOI: 10.1038/s41597-024-03153-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/15/2024] [Indexed: 04/11/2024] Open
Abstract
This paper presents a standardised dataset versioning framework for improved reusability, recognition and data version tracking, facilitating comparisons and informed decision-making for data usability and workflow integration. The framework adopts a software engineering-like data versioning nomenclature ("major.minor.patch") and incorporates data schema principles to promote reproducibility and collaboration. To quantify changes in statistical properties over time, the concept of data drift metrics (d) is introduced. Three metrics (dP, dE,PCA, and dE,AE) based on unsupervised Machine Learning techniques (Principal Component Analysis and Autoencoders) are evaluated for dataset creation, update, and deletion. The optimal choice is the dE,PCA metric, combining PCA models with splines. It exhibits efficient computational time, with values below 50 for new dataset batches and values consistent with seasonal or trend variations. Major updates (i.e., values of 100) occur when scaling transformations are applied to over 30% of variables while efficiently handling information loss, yielding values close to 0. This metric achieved a favourable trade-off between interpretability, robustness against information loss, and computation time.
Collapse
Affiliation(s)
| | - Michael Bradford
- Cloud Competency Centre, National College of Ireland, Dublin, Ireland
| | - Adriana E Chis
- Cloud Competency Centre, National College of Ireland, Dublin, Ireland
| | | |
Collapse
|
2
|
Le L, Zuccon G, Demartini G, Zhao G, Zhang X. Leveraging Semantic Type Dependencies for Clinical Named Entity Recognition. AMIA Annu Symp Proc 2023; 2022:662-671. [PMID: 37128396 PMCID: PMC10148283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Previous work on clinical relation extraction from free-text sentences leveraged information about semantic types from clinical knowledge bases as a part of entity representations. In this paper, we exploit additional evidence by also making use of domain-specific semantic type dependencies. We encode the relation between a span of tokens matching a Unified Medical Language System (UMLS) concept and other tokens in the sentence. We implement our method and compare against different named entity recognition (NER) architectures (i.e., BiLSTM-CRF and BiLSTM-GCN-CRF) using different pre-trained clinical embeddings (i.e., BERT, BioBERT, UMLSBert). Our experimental results on clinical datasets show that in some cases NER effectiveness can be significantly improved by making use of domain-specific semantic type dependencies. Our work is also the first study generating a matrix encoding to make use of more than three dependencies in one pass for the NER task.
Collapse
Affiliation(s)
- Linh Le
- University of Queensland, Australia linh.le, g.zuccon,
| | - Guido Zuccon
- University of Queensland, Australia linh.le, g.zuccon,
| | | | - Genghong Zhao
- Neusoft Research of Intelligent Healthcare Technology, Co. Ltd., Shenyang, China
| | - Xia Zhang
- Neusoft Corporation, Shenyang, China
| |
Collapse
|
3
|
Li J, Dong Z, Lu S, Wang SJ, Yan WJ, Ma Y, Liu Y, Huang C, Fu X. CAS(ME) 3: A Third Generation Facial Spontaneous Micro-Expression Database With Depth Information and High Ecological Validity. IEEE Trans Pattern Anal Mach Intell 2023; 45:2782-2800. [PMID: 35560102 DOI: 10.1109/tpami.2022.3174895] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Micro-expression (ME) is a significant non-verbal communication clue that reveals one person's genuine emotional state. The development of micro-expression analysis (MEA) has just gained attention in the last decade. However, the small sample size problem constrains the use of deep learning on MEA. Besides, ME samples distribute in six different databases, leading to database bias. Moreover, the ME database development is complicated. In this article, we introduce a large-scale spontaneous ME database: CAS(ME) 3. The contribution of this article is summarized as follows: (1) CAS(ME) 3 offers around 80 hours of videos with over 8,000,000 frames, including manually labeled 1,109 MEs and 3,490 macro-expressions. Such a large sample size allows effective MEA method validation while avoiding database bias. (2) Inspired by psychological experiments, CAS(ME) 3 provides the depth information as an additional modality unprecedentedly, contributing to multi-modal MEA. (3) For the first time, CAS(ME) 3 elicits ME with high ecological validity using the mock crime paradigm, along with physiological and voice signals, contributing to practical MEA. (4) Besides, CAS(ME) 3 provides 1,508 unlabeled videos with more than 4,000,000 frames, i.e., a data platform for unsupervised MEA methods. (5) Finally, we demonstrate the effectiveness of depth information by the proposed depth flow algorithm and RGB-D information.
Collapse
|
4
|
Tsueng G, Cano MAA, Bento J, Czech C, Kang M, Pache L, Rasmussen LV, Savidge TC, Starren J, Wu Q, Xin J, Yeaman MR, Zhou X, Su AI, Wu C, Brown L, Shabman RS, Hughes LD. Developing a standardized but extendable framework to increase the findability of infectious disease datasets. Sci Data 2023; 10:99. [PMID: 36823157 PMCID: PMC9950378 DOI: 10.1038/s41597-023-01968-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open
Abstract
Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.
Collapse
Affiliation(s)
- Ginger Tsueng
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| | - Marco A Alvarado Cano
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - José Bento
- Department of Computer Science, Boston College, 245 Beacon St, Chestnut Hill, MA, 02467, USA
| | - Candice Czech
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Mengjia Kang
- Division of Pulmonary and Critical Care, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Lars Pache
- Infectious and Inflammatory Disease Center, Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Tor C Savidge
- Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Justin Starren
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Qinglong Wu
- Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jiwen Xin
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Michael R Yeaman
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Divisions of Molecular Medicine and Infectious Diseases, Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
- Lundquist Institute for Infection & Immunity at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Xinghua Zhou
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Andrew I Su
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Chunlei Wu
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Liliana Brown
- Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
| | - Reed S Shabman
- Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
| | - Laura D Hughes
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| |
Collapse
|
5
|
Loughrey MB, Webster F, Arends MJ, Brown I, Burgart LJ, Cunningham C, Flejou JF, Kakar S, Kirsch R, Kojima M, Lugli A, Rosty C, Sheahan K, West NP, Wilson RH, Nagtegaal ID. Dataset for Pathology Reporting of Colorectal Cancer: Recommendations From the International Collaboration on Cancer Reporting (ICCR). Ann Surg 2022; 275:e549-e561. [PMID: 34238814 PMCID: PMC8820778 DOI: 10.1097/sla.0000000000005051] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVE The aim of this study to describe a new international dataset for pathology reporting of colorectal cancer surgical specimens, produced under the auspices of the International Collaboration on Cancer Reporting (ICCR). BACKGROUND Quality of pathology reporting and mutual understanding between colorectal surgeon, pathologist and oncologist are vital to patient management. Some pathology parameters are prone to variable interpretation, resulting in differing positions adopted by existing national datasets. METHODS The ICCR, a global alliance of major pathology institutions with links to international cancer organizations, has developed and ratified a rigorous and efficient process for the development of evidence-based, structured datasets for pathology reporting of common cancers. Here we describe the production of a dataset for colorectal cancer resection specimens by a multidisciplinary panel of internationally recognized experts. RESULTS The agreed dataset comprises eighteen core (essential) and seven non-core (recommended) elements identified from a review of current evidence. Areas of contention are addressed, some highly relevant to surgical practice, with the aim of standardizing multidisciplinary discussion. The summation of all core elements is considered to be the minimum reporting standard for individual cases. Commentary is provided, explaining each element's clinical relevance, definitions to be applied where appropriate for the agreed list of value options and the rationale for considering the element as core or non-core. CONCLUSIONS This first internationally agreed dataset for colorectal cancer pathology reporting promotes standardization of pathology reporting and enhanced clinicopathological communication. Widespread adoption will facilitate international comparisons, multinational clinical trials and help to improve the management of colorectal cancer globally.
Collapse
Affiliation(s)
- Maurice B Loughrey
- Centre for Public Health, Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, UK
- Department of Cellular Pathology, Belfast Health and Social Care Trust, Belfast, Northern Ireland, UK
| | - Fleur Webster
- International Collaboration on Cancer Reporting, Sydney, NSW, Australia
| | - Mark J Arends
- Division of Pathology, Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Ian Brown
- Envoi Pathology, Kelvin Grove, QLD, Australia
| | - Lawrence J Burgart
- Department of Pathology, Virginia Piper Cancer Institute, Abbott Northwestern Hospital, Minneapolis, MN
| | - Chris Cunningham
- Department of Colorectal Surgery, Churchill Hospital, Oxford University Hospitals NHSFT, Oxford, UK
| | - Jean-Francois Flejou
- Department of Pathology, Saint-Antoine Hospital, Sorbonne University, Paris, France
| | - Sanjay Kakar
- Department of Pathology, University of California San Francisco, San Francisco, CA
| | - Richard Kirsch
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Motohiro Kojima
- Division of Pathology, Research Center for Innovative Oncology, National Cancer Center, Chiba, Kashiwa, Japan
| | | | - Christophe Rosty
- Faculty of Medicine, The University of Queensland, Brisbane, QLD, Australia
- Envoi Specialist Pathologists, Brisbane, QLD, Australia
- Department of Pathology, University of Melbourne, Melbourne, VIC, Australia
| | - Kieran Sheahan
- Department of Pathology, St Vincent's University Hospital & University College, Dublin, Ireland
| | - Nicholas P West
- Pathology and Data Analytics, Leeds Institute of Medical Research at St. James's, University of Leeds, Leeds, UK
| | - Richard H Wilson
- Institute of Cancer Sciences, University of Glasgow, Glasgow, UK
| | - Iris D Nagtegaal
- Department of Pathology, Radboud University Medical Centre, Nijmegen, The Netherlands
| |
Collapse
|
6
|
Boccardi M, Monsch AU, Ferrari C, Altomare D, Berres M, Bos I, Buchmann A, Cerami C, Didic M, Festari C, Nicolosi V, Sacco L, Aerts L, Albanese E, Annoni JM, Ballhausen N, Chicherio C, Démonet JF, Descloux V, Diener S, Ferreira D, Georges J, Gietl A, Girtler N, Kilimann I, Klöppel S, Kustyniuk N, Mecocci P, Mella N, Pigliautile M, Seeher K, Shirk SD, Toraldo A, Brioschi-Guevara A, Chan KCG, Crane PK, Dodich A, Grazia A, Kochan NA, de Oliveira FF, Nobili F, Kukull W, Peters O, Ramakers I, Sachdev PS, Teipel S, Visser PJ, Wagner M, Weintraub S, Westman E, Froelich L, Brodaty H, Dubois B, Cappa SF, Salmon D, Winblad B, Frisoni GB, Kliegel M. Harmonizing neuropsychological assessment for mild neurocognitive disorders in Europe. Alzheimers Dement 2022; 18:29-42. [PMID: 33984176 PMCID: PMC9642857 DOI: 10.1002/alz.12365] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 03/11/2021] [Accepted: 04/05/2021] [Indexed: 01/03/2023]
Abstract
INTRODUCTION Harmonized neuropsychological assessment for neurocognitive disorders, an international priority for valid and reliable diagnostic procedures, has been achieved only in specific countries or research contexts. METHODS To harmonize the assessment of mild cognitive impairment in Europe, a workshop (Geneva, May 2018) convened stakeholders, methodologists, academic, and non-academic clinicians and experts from European, US, and Australian harmonization initiatives. RESULTS With formal presentations and thematic working-groups we defined a standard battery consistent with the U.S. Uniform DataSet, version 3, and homogeneous methodology to obtain consistent normative data across tests and languages. Adaptations consist of including two tests specific to typical Alzheimer's disease and behavioral variant frontotemporal dementia. The methodology for harmonized normative data includes consensus definition of cognitively normal controls, classification of confounding factors (age, sex, and education), and calculation of minimum sample sizes. DISCUSSION This expert consensus allows harmonizing the diagnosis of neurocognitive disorders across European countries and possibly beyond.
Collapse
Affiliation(s)
- Marina Boccardi
- DZNE - Deutsches Zentrum für Neurodegenerative Erkrankungen, Rostock-Greifswald site, Rostock, Germany
- LANVIE - Laboratory of Neuroimaging of Aging, University of Geneva, Geneva, Switzerland
| | - Andreas U Monsch
- Memory Clinic, University Department of Geriatric Medicine FELIX PLATTER, Faculty of Psychology, University of Basel, Basel, Switzerland
| | - Clarissa Ferrari
- Unit of Statistics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Daniele Altomare
- LANVIE - Laboratory of Neuroimaging of Aging, University of Geneva, Geneva, Switzerland
- Memory Center, Geneva University Hospitals, Geneva, Switzerland
| | - Manfred Berres
- Department of Mathematics and Technology, University of Applied Sciences Koblenz, Koblenz, Germany
| | - Isabelle Bos
- Department of Psychiatry and Neuropsychology, School of Mental Health and Neuroscience, Alzheimer Center Limburg, Maastricht University, Maastricht, The Netherlands
| | - Andreas Buchmann
- Institute for Regenerative Medicine, University of Zurich, Schlieren, Switzerland
| | - Chiara Cerami
- Institute for Advanced Studies (IUSS-Pavia), Pavia, Italy, Pavia, Italy
- IRCCS Mondino Foundation, Pavia, Italy
| | - Mira Didic
- APHM, Timone, Service de Neurologie et Neuropsychologie, Hôpital Timone Adultes, Marseille, France
- Aix-Marseille Université, Inserm, INS, UMR_S 1106, 13005, Marseille, France
| | - Cristina Festari
- Laboratory of Alzheimer's Neuroimaging and Epidemiology, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Valentina Nicolosi
- Laboratory of Alzheimer's Neuroimaging and Epidemiology, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Leonardo Sacco
- Clinic of Neurology, Neurocenter of Southern Switzerland, EOC, Lugano, Switzerland
| | - Liesbeth Aerts
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
| | | | - Jean-Marie Annoni
- Department of Neuroscience and Movement Sciences, University of Geneva and Fribourg Hospital, Geneva, Switzerland
| | - Nicola Ballhausen
- Department of Developmental Psychology, Tilburg University, Tilburg, The Netherlands
| | | | - Jean-François Démonet
- Leenaards Memory Centre-CHUV, Clinical Neuroscience Department, Cité Hospitalière CHUV, Lausanne, Switzerland
| | - Virginie Descloux
- Department of Neuroscience and Movement Sciences, University of Geneva and Fribourg Hospital, Geneva, Switzerland
| | - Suzie Diener
- Department of Neurology, Kantonsspital St. Gallen, St. Gallen, Switzerland
| | - Daniel Ferreira
- Division of Clinical Geriatrics, Center for Alzheimer Research, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | | | - Anton Gietl
- Institute for Regenerative Medicine, University of Zurich, Schlieren, Switzerland
| | - Nicola Girtler
- Clinical Psychology and Psychotherapy, IRCCS Ospedale Policlinico San Martino, Genova, Italy
- Dept of Neuroscience (DINOGMI), University of Genoa, Genoa, Italy
| | - Ingo Kilimann
- DZNE - Deutsches Zentrum für Neurodegenerative Erkrankungen, Rostock-Greifswald site, Rostock, Germany
| | - Stefan Klöppel
- Hospital of Old Age Psychiatry and Psychotherapy, University of Bern, Bern, Switzerland
| | - Nicole Kustyniuk
- Hospital of Old Age Psychiatry and Psychotherapy, University of Bern, Bern, Switzerland
| | - Patrizia Mecocci
- Department of Medicine and Surgery, Institute of Gerontology and Geriatrics, University of Perugia, Perugia, Italy
| | - Nathalie Mella
- Cognitive Aging Lab, University of Geneva, Geneva, Switzerland
| | - Martina Pigliautile
- Department of Medicine and Surgery, Institute of Gerontology and Geriatrics, University of Perugia, Perugia, Italy
| | - Katrin Seeher
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
| | - Steven D Shirk
- VISN 1 New England MIRECC and VISN 1 New England GRECC, Bedford VA Healthcare System, Bedford, Department of Psychiatry and Population and Quantitative Health Sciences, University of Massachusetts Medical School, Massachusetts, USA
| | - Alessio Toraldo
- Department of Brain and Behavioural Sciences, University of Pavia, Pavia, Italy, Milan Center for Neuroscience, Milan, Italy
| | - Andrea Brioschi-Guevara
- Leenaards Memory Centre-CHUV, Clinical Neuroscience Department, Cité Hospitalière CHUV, Lausanne, Switzerland
| | - Kwun C G Chan
- National Alzheimer's Coordination Center (NACC), Department of Epidemiology, University of Washington, Seattle, Washington, USA
| | - Paul K Crane
- Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Alessandra Dodich
- Neuroimaging and Innovative Molecular Tracers Laboratory, and Division of Nuclear Medicine, Diagnostic Departement, University of Geneva, University Hospitals of Geneva, Geneva, Switzerland
- Centre for Mind/Brain Sciences, University of Trento, Rovereto, Italy
| | - Alice Grazia
- DZNE - Deutsches Zentrum für Neurodegenerative Erkrankungen, Rostock-Greifswald site, Rostock, Germany
| | - Nicole A Kochan
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
| | | | - Flavio Nobili
- Neurology Clinic, IRCCS Ospedale Policlinico San Martino, Genova, Italy
- Dept of Neuroscience (DINOGMI), University of Genoa, Genoa, Italy
| | - Walter Kukull
- National Alzheimer's Coordination Center (NACC), Department of Epidemiology, University of Washington, Seattle, Washington, USA
| | - Oliver Peters
- Department of Psychiatry and Psychotherapy, Campus Benjamin Franklin, Charité, Universitätsmedizin Berlin, Berlin, Germany, ZNE, German Center for Neurodegenerative Diseases, Berlin, Germany
| | - Inez Ramakers
- Department of Psychiatry and Neuropsychology, School of Mental Health and Neuroscience, Alzheimer Center Limburg, Maastricht University, Maastricht, The Netherlands
| | - Perminder S Sachdev
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
| | - Stefan Teipel
- DZNE - Deutsches Zentrum für Neurodegenerative Erkrankungen, Rostock-Greifswald site, Rostock, Germany
| | - Pieter Jelle Visser
- Department of Psychiatry and Neuropsychology, School of Mental Health and Neuroscience, Alzheimer Center Limburg, Maastricht University, Maastricht, The Netherlands
| | - Michael Wagner
- DZNE, German Center for Neurodegenerative Diseases, Bonn, Germany
- Department of Neurodegenerative Diseases and Geriatric Psychiatry, University Hospital Bonn, Bonn, Germany
| | - Sandra Weintraub
- Mesulam Center for Cognitive Neurology and Alzheimer's Disease, Northwestern Feinberg School of Medicine, Chicago, Illinois
| | - Eric Westman
- Division of Clinical Geriatrics, Center for Alzheimer Research, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Lutz Froelich
- University of Heidelberg, Heidelberg, Central Institute of Mental Health, Medical Faculty Mannheim, Mannheim, Germany
| | - Henry Brodaty
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
| | - Bruno Dubois
- Hôpital Pitié-Salpêtrière, AP-HP, Alzheimer Research Institute (IM2A), and Institut du cerveau et la moelle (ICM), Sorbonne Université, Paris, France
| | - Stefano F Cappa
- Institute for Advanced Studies (IUSS-Pavia), Pavia, Italy, Pavia, Italy
- IRCCS Mondino Foundation, Pavia, Italy
| | - David Salmon
- Department of Neurosciences, University of California San Diego School of Medicine, San Diego, USA
| | - Bengt Winblad
- Dept NVS, Center for Alzheimer Research, Division of Neurogeriatrics, Karolinska Institutet, Stockholm, Sweden
| | - Giovanni B Frisoni
- LANVIE - Laboratory of Neuroimaging of Aging, University of Geneva, Geneva, Switzerland
- Memory Center, Geneva University Hospitals, Geneva, Switzerland
| | - Matthias Kliegel
- Cognitive Aging Lab, Department of Psychology, University of Geneva, Geneva, Switzerland
| |
Collapse
|
7
|
Lotfollahi M, Naghipourfar M, Luecken MD, Khajavi M, Büttner M, Wagenstetter M, Avsec Ž, Gayoso A, Yosef N, Interlandi M, Rybakov S, Misharin AV, Theis FJ. Mapping single-cell data to reference atlases by transfer learning. Nat Biotechnol 2022; 40:121-130. [PMID: 34462589 PMCID: PMC8763644 DOI: 10.1038/s41587-021-01001-7] [Citation(s) in RCA: 147] [Impact Index Per Article: 73.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 06/28/2021] [Indexed: 02/07/2023]
Abstract
Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Mohsen Naghipourfar
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Malte D Luecken
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Matin Khajavi
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Maren Büttner
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Marco Wagenstetter
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Žiga Avsec
- Department of Computer Science, Technical University of Munich, Munich, Germany
| | - Adam Gayoso
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | - Marta Interlandi
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sergei Rybakov
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fabian J Theis
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
| |
Collapse
|
8
|
Bradley VC, Kuriwaki S, Isakov M, Sejdinovic D, Meng XL, Flaxman S. Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature 2021; 600:695-700. [PMID: 34880504 PMCID: PMC8653636 DOI: 10.1038/s41586-021-04198-4] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 10/29/2021] [Indexed: 12/20/2022]
Abstract
Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox1. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi-Facebook2,3 (about 250,000 responses per week) and Census Household Pulse4 (about 75,000 every two weeks). In May 2021, Delphi-Facebook overestimated uptake by 17 percentage points (14-20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11-17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios-Ipsos online panel5 with about 1,000 responses per week following survey research best practices6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.
Collapse
Affiliation(s)
| | - Shiro Kuriwaki
- Department of Political Science, Stanford University, Stanford, CA, USA
| | | | | | - Xiao-Li Meng
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Seth Flaxman
- Department of Computer Science, University of Oxford, Oxford, UK.
| |
Collapse
|
9
|
Matelsky JK, Rodriguez LM, Xenes D, Gion T, Hider R, Wester BA, Gray-Roncal W. An Integrated Toolkit for Extensible and Reproducible Neuroscience. 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) 2021; 2021:2413-2418. [PMID: 34891768 PMCID: PMC9044020 DOI: 10.1109/embc46164.2021.9630199] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
As neuroimagery datasets continue to grow in size, the complexity of data analyses can require a detailed understanding and implementation of systems computer science for storage, access, processing, and sharing. Currently, several general data standards (e.g., Zarr, HDF5, precomputed) and purpose-built ecosystems (e.g., BossDB, CloudVolume, DVID, and Knossos) exist. Each of these systems has advantages and limitations and is most appropriate for different use cases. Using datasets that don’t fit into RAM in this heterogeneous environment is challenging, and significant barriers exist to leverage underlying research investments. In this manuscript, we outline our perspective for how to approach this challenge through the use of community provided, standardized interfaces that unify various computational backends and abstract computer science challenges from the scientist. We introduce desirable design patterns and share our reference implementation called intern.
Collapse
|
10
|
Maleki F, Ovens K, McQuillan I, Kusalik AJ. Silver: Forging almost Gold Standard Datasets. Genes (Basel) 2021; 12:genes12101523. [PMID: 34680918 PMCID: PMC8535810 DOI: 10.3390/genes12101523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 09/19/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene-gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity.
Collapse
Affiliation(s)
- Farhad Maleki
- Augmented Intelligence & Precision Health Laboratory, Institute of the McGill University Health Centre, McGill University, Montreal, QC H4A 3S5, Canada;
- Correspondence:
| | - Katie Ovens
- Augmented Intelligence & Precision Health Laboratory, Institute of the McGill University Health Centre, McGill University, Montreal, QC H4A 3S5, Canada;
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada; (I.M.); (A.J.K.)
| | - Anthony J. Kusalik
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5C9, Canada; (I.M.); (A.J.K.)
| |
Collapse
|
11
|
Abstract
Given multiple source datasets with labels, how can we train a target model with no labeled data? Multi-source domain adaptation (MSDA) aims to train a model using multiple source datasets different from a target dataset in the absence of target data labels. MSDA is a crucial problem applicable to many practical cases where labels for the target data are unavailable due to privacy issues. Existing MSDA frameworks are limited since they align data without considering labels of the features of each domain. They also do not fully utilize the target data without labels and rely on limited feature extraction with a single extractor. In this paper, we propose Multi-EPL, a novel method for MSDA. Multi-EPL exploits label-wise moment matching to align the conditional distributions of the features for the labels, uses pseudolabels for the unavailable target labels, and introduces an ensemble of multiple feature extractors for accurate domain adaptation. Extensive experiments show that Multi-EPL provides the state-of-the-art performance for MSDA tasks in both image domains and text domains, improving the accuracy by up to 13.20%.
Collapse
Affiliation(s)
- Seongmin Lee
- Seoul National University, Seoul, Republic of Korea
| | - Hyunsik Jeon
- Seoul National University, Seoul, Republic of Korea
| | - U. Kang
- Seoul National University, Seoul, Republic of Korea
- * E-mail:
| |
Collapse
|
12
|
Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A, Bridgland A, Cowie A, Meyer C, Laydon A, Velankar S, Kleywegt GJ, Bateman A, Evans R, Pritzel A, Figurnov M, Ronneberger O, Bates R, Kohl SAA, Potapenko A, Ballard AJ, Romera-Paredes B, Nikolov S, Jain R, Clancy E, Reiman D, Petersen S, Senior AW, Kavukcuoglu K, Birney E, Kohli P, Jumper J, Hassabis D. Highly accurate protein structure prediction for the human proteome. Nature 2021; 596:590-596. [PMID: 34293799 PMCID: PMC8387240 DOI: 10.1038/s41586-021-03828-1] [Citation(s) in RCA: 1287] [Impact Index Per Article: 429.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023]
Abstract
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | | | | |
Collapse
|
13
|
Saini KS, Twelves C. Determining lines of therapy in patients with solid cancers: a proposed new systematic and comprehensive framework. Br J Cancer 2021; 125:155-163. [PMID: 33850304 PMCID: PMC8292475 DOI: 10.1038/s41416-021-01319-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 01/25/2021] [Accepted: 02/10/2021] [Indexed: 12/18/2022] Open
Abstract
The complexity of neoplasia and its treatment are a challenge to the formulation of general criteria that are applicable across solid cancers. Determining the number of prior lines of therapy (LoT) is critically important for optimising future treatment, conducting medication audits, and assessing eligibility for clinical trial enrolment. Currently, however, no accepted set of criteria or definitions exists to enumerate LoT. In this article, we seek to open a dialogue to address this challenge by proposing a systematic and comprehensive framework to determine LoT uniformly across solid malignancies. First, key terms, including LoT and 'clinical progression of disease' are defined. Next, we clarify which therapies should be assigned a LoT, and why. Finally, we propose reporting LoT in a novel and standardised format as LoT N (CLoT + PLoT), where CLoT is the number of systemic anti-cancer therapies (SACT) administered with curative intent and/or in the early setting, PLoT is the number of SACT given with palliative intent and/or in the advanced setting, and N is the sum of CLoT and PLoT. As a next step, the cancer research community should develop and adopt standardised guidelines for enumerating LoT in a uniform manner.
Collapse
Affiliation(s)
- Kamal S Saini
- Covance Inc., Princeton, NJ, USA.
- East Suffolk and North Essex NHS Foundation Trust, Ipswich, UK.
| | - Chris Twelves
- University of Leeds and Leeds Teaching Hospitals Trust, Leeds, UK.
| |
Collapse
|
14
|
Guan Z, Chen XG, Hay J, van Gerven J, Burggraaf J, de Kam M. Stability analysis of clustering of Norris' visual analogue scale: Applying the consensus clustering approach. Medicine (Baltimore) 2021; 100:e25363. [PMID: 33907093 PMCID: PMC8084085 DOI: 10.1097/md.0000000000025363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 01/25/2021] [Accepted: 03/11/2021] [Indexed: 11/19/2022] Open
Abstract
ABSTRACT Visual analogue scales are widely used to measure subjective responses. Norris' 16 visual analogue scales (N_VAS) measure subjective feelings of alertness and mood. Up to now, different scientists have clustered items of N_VAS into different ways and Bond and Lader's way has been the most frequently used in clinical research. However, there are concerns about the stability of this clustering over different subject samples and different drug classes. The aim of this study was to test whether Bond and Lader's clustering was stable in terms of subject samples and drug effects. Alternative clustering of N_VAS was tested.Data from studies with 3 types of drugs: cannabinoid receptor agonist (delta-9-tetrahydrocannabinol [THC]), muscarinic antagonist (scopolamine), and benzodiazepines (midazolam and lorazepam), collected between 2005 and 2012, were used for this analysis. Exploratory factor analysis (EFA) was used to test the clustering algorithm of Bond and Lader. Consensus clustering was performed to test the stability of clustering results over samples and over different drug types. Stability analysis was performed using a three-cluster assumption, and then on other alternative assumptions.Heat maps of the consensus matrix (CM) and density plots showed instability of the three-cluster hypothesis and suggested instability over the 3 drug classes. Two- and four-cluster hypothesis were also tested. Heat maps of the CM and density plots suggested that the two-cluster assumption was superior.In summary, the two-cluster assumption leads to a provably stable outcome over samples and the 3 drug types based on the data used.
Collapse
Affiliation(s)
- Zheng Guan
- Centre for Human Drug Research
- Leiden University Medical Center, The Netherlands
| | | | | | - Joop van Gerven
- Centre for Human Drug Research
- Leiden University Medical Center, The Netherlands
| | - Jacobus Burggraaf
- Centre for Human Drug Research
- Leiden University Medical Center, The Netherlands
| | | |
Collapse
|
15
|
Sáez C, Romero N, Conejero JA, García-Gómez JM. Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset. J Am Med Inform Assoc 2021; 28:360-364. [PMID: 33027509 PMCID: PMC7797735 DOI: 10.1093/jamia/ocaa258] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 09/07/2020] [Accepted: 09/28/2020] [Indexed: 02/02/2023] Open
Abstract
OBJECTIVE The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. MATERIALS AND METHODS We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. RESULTS Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. CONCLUSIONS Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.
Collapse
Affiliation(s)
- Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - Nekane Romero
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - J Alberto Conejero
- Instituto Universitario de Matemática Pura y Aplicada, Universitat Politécnica de València, Valencia, Spain
| | - Juan M García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| |
Collapse
|
16
|
Duffy JMN, Bhattacharya S, Bhattacharya S, Bofill M, Collura B, Curtis C, Evers JLH, Giudice LC, Farquharson RG, Franik S, Hickey M, Hull ML, Jordan V, Khalaf Y, Legro RS, Lensen S, Mavrelos D, Mol BW, Niederberger C, Ng EHY, Puscasiu L, Repping S, Sarris I, Showell M, Strandell A, Vail A, van Wely M, Vercoe M, Vuong NL, Wang AY, Wang R, Wilkinson J, Youssef MA, Farquhar CM. Standardizing definitions and reporting guidelines for the infertility core outcome set: an international consensus development study. Fertil Steril 2020; 115:201-212. [PMID: 33272619 DOI: 10.1016/j.fertnstert.2020.11.013] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 07/08/2020] [Accepted: 07/22/2020] [Indexed: 01/21/2023]
Abstract
STUDY QUESTION Can consensus definitions for the core outcome set for infertility be identified in order to recommend a standardized approach to reporting? SUMMARY ANSWER Consensus definitions for individual core outcomes, contextual statements, and a standardized reporting table have been developed. WHAT IS KNOWN ALREADY Different definitions exist for individual core outcomes for infertility. This variation increases the opportunities for researchers to engage with selective outcome reporting, which undermines secondary research and compromises clinical practice guideline development. STUDY DESIGN, SIZE, DURATION Potential definitions were identified by a systematic review of definition development initiatives and clinical practice guidelines and by reviewing Cochrane Gynaecology and Fertility Group guidelines. These definitions were discussed in a face-to-face consensus development meeting, which agreed consensus definitions. A standardized approach to reporting was also developed as part of the process. PARTICIPANTS/MATERIALS, SETTING, METHODS Healthcare professionals, researchers, and people with fertility problems were brought together in an open and transparent process using formal consensus development methods. MAIN RESULTS AND THE ROLE OF CHANCE Forty-four potential definitions were inventoried across four definition development initiatives, including the Harbin Consensus Conference Workshop Group and International Committee for Monitoring Assisted Reproductive Technologies, 12 clinical practice guidelines, and Cochrane Gynaecology and Fertility Group guidelines. Twenty-seven participants, from 11 countries, contributed to the consensus development meeting. Consensus definitions were successfully developed for all core outcomes. Specific recommendations were made to improve reporting. LIMITATIONS, REASONS FOR CAUTION We used consensus development methods, which have inherent limitations. There was limited representation from low- and middle-income countries. WIDER IMPLICATIONS OF THE FINDINGS A minimum data set should assist researchers in populating protocols, case report forms, and other data collection tools. The generic reporting table should provide clear guidance to researchers and improve the reporting of their results within journal publications and conference presentations. Research funding bodies, the Standard Protocol Items: Recommendations for Interventional Trials statement, and over 80 specialty journals have committed to implementing this core outcome set. STUDY FUNDING/COMPETING INTEREST(S) This research was funded by the Catalyst Fund, Royal Society of New Zealand, Auckland Medical Research Fund, and Maurice and Phyllis Paykel Trust. Siladitya Bhattacharya reports being the Editor-in-Chief of Human Reproduction Open and an editor of the Cochrane Gynaecology and Fertility group. Hans Evers reports being the Editor Emeritus of Human Reproduction. Richard Legro reports consultancy fees from Abbvie, Bayer, Ferring, Fractyl, Insud Pharma and Kindex and research sponsorship from Guerbet and Hass Avocado Board. Ben Mol reports consultancy fees from Guerbet, iGenomix, Merck, Merck KGaA and ObsEva. Craig Niederberger reports being the Editor-in-Chief of Fertility and Sterility and Section Editor of the Journal of Urology, research sponsorship from Ferring, and a financial interest in NexHand. Ernest Ng reports research sponsorship from Merck. Annika Strandell reports consultancy fees from Guerbet. Jack Wilkinson reports being a statistical editor for the Cochrane Gynaecology and Fertility group. Andy Vail reports that he is a Statistical Editor of the Cochrane Gynaecology & Fertility Review Group and of the journal Reproduction. His employing institution has received payment from HFEA for his advice on review of research evidence to inform their 'traffic light' system for infertility treatment 'add-ons'. Lan Vuong reports consultancy and conference fees from Ferring, Merck and Merck Sharp and Dohme. The remaining authors declare no competing interests in relation to the work presented. All authors have completed the disclosure form. TRIAL REGISTRATION NUMBER Core Outcome Measures in Effectiveness Trials Initiative: 1023.
Collapse
Affiliation(s)
- J M N Duffy
- King's Fertility, Fetal Medicine Research Institute, London, UK; Institute for Women's Health, University College London, London, UK.
| | - S Bhattacharya
- School of Medicine, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, UK
| | - S Bhattacharya
- School of Medicine, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, UK
| | - M Bofill
- Department of Obstetrics and Gynaecology, University of Auckland, Auckland, New Zealand
| | - B Collura
- RESOLVE: The National Infertility Association, Virginia, United States
| | - C Curtis
- Fertility New Zealand, Auckland, New Zealand; School of Psychology, University of Waikato, Hamilton, New Zealand
| | - J L H Evers
- Maastricht University Medical Centre, Maastricht, The Netherlands
| | - L C Giudice
- Center for Research, Innovation and Training in Reproduction and Infertility, Center for Reproductive Sciences, University of California, San Francisco, California, United States; International Federation of Fertility Societies, Philadelphia, Pennsylvania, United States
| | - R G Farquharson
- Department of Obstetrics and Gynaecology, Liverpool Women's NHS Foundation Trust, Liverpool, UK
| | - S Franik
- Department of Obstetrics and Gynaecology, Münster University Hospital, Münster, Germany
| | - M Hickey
- Department of Obstetrics and Gynaecology, University of Melbourne, Victoria, Australia
| | - M L Hull
- Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - V Jordan
- Department of Obstetrics and Gynaecology, University of Auckland, Auckland, New Zealand
| | - Y Khalaf
- Department of Women and Children's Health, King's College London, Guy's Hospital, London
| | - R S Legro
- Department of Obstetrics and Gynaecology, Penn State College of Medicine, Pennsylvania
| | - S Lensen
- Department of Obstetrics and Gynaecology, University of Melbourne, Victoria, Australia
| | - D Mavrelos
- Reproductive Medicine Unit, University College Hospital, London, UK
| | - B W Mol
- Department of Obstetrics and Gynaecology, Monash University, Melbourne, Australia
| | - C Niederberger
- Department of Urology, University of Illinois at Chicago College of Medicine, Chicago, Illinois
| | - E H Y Ng
- Department of Obstetrics and Gynaecology, The University of Hong Kong, Hong Kong; Shenzhen Key Laboratory of Fertility Regulation, The University of Hong Kong-Shenzhen Hospital, China
| | - L Puscasiu
- University of Medicine, Pharmacy, Sciences and Technology, Targu Mures, Romania
| | - S Repping
- Amsterdam University Medical Centers, Amsterdam, The Netherlands; National Health Care Institute, Diemen, The Netherlands
| | - I Sarris
- King's Fertility, Fetal Medicine Research Institute, London, UK
| | - M Showell
- Cochrane Gynaecology and Fertility Group, University of Auckland, Auckland, New Zealand
| | - A Strandell
- Department of Obstetrics and Gynecology, Sahlgrenska Academy, University of Gothenburg, Göteborg, Sweden
| | - A Vail
- Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - M van Wely
- Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - M Vercoe
- Cochrane Gynaecology and Fertility Group, University of Auckland, Auckland, New Zealand
| | - N L Vuong
- Department of Obstetrics and Gynaecology, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| | - A Y Wang
- Faculty of Health, University of Technology, Sydney, Broadway, Australia
| | - R Wang
- Department of Obstetrics and Gynaecology, Monash University, Melbourne, Australia
| | - J Wilkinson
- Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - M A Youssef
- Department of Obstetrics & Gynaecology, Faculty of Medicine, Cairo University, Cairo, Egypt
| | - C M Farquhar
- Department of Obstetrics and Gynaecology, University of Auckland, Auckland, New Zealand; Cochrane Gynaecology and Fertility Group, University of Auckland, Auckland, New Zealand
| |
Collapse
|
17
|
Li L, Prato CG, Wang Y. Ranking contributors to traffic crashes on mountainous freeways from an incomplete dataset: A sequential approach of multivariate imputation by chained equations and random forest classifier. Accid Anal Prev 2020; 146:105744. [PMID: 32861970 DOI: 10.1016/j.aap.2020.105744] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 07/24/2020] [Accepted: 08/14/2020] [Indexed: 06/11/2023]
Abstract
The estimation of the effect of contributors to crash injury severity and the prediction of crash injury severity outcomes suffer often from biases related to missing data in crash datasets that contain incomplete records. As both estimation and prediction would greatly improve if the missing values were recovered, this study proposes a sequential approach to handle incomplete crash datasets and rank contributors to the injury severity of crashes on mountainous freeways in China. The sequential approach consists of two parts: (i) multivariate imputation by chained equations imputes the missing values of independent variables; (ii) a random forest classifier analyses the correlation between the dependent and the independent variables. The first part considers different imputation methods in light of the independent variables being either binary, categorical or continuous, whereas the second part classifies the correlations according to the random forest classifier. The proposed method was applied to the case-study about mountainous freeways in China and compared to the analysis of the raw dataset to evaluate its effectiveness, and the results illustrate that the method improves significantly the classification accuracy when compared with existing methods. Moreover, the classifier ranked the contributors to the injury severity of traffic crashes on mountainous freeways: in order of importance vehicle type, crash type, road longitudinal gradient, crash cause, curve radius, and deflection angles. Interestingly, a lower importance was found for environmental factors.
Collapse
Affiliation(s)
- Linchao Li
- College of Civil and Transportation Engineering, Shenzhen University, Shenzhen, Guangdong, 518060 People's Republic of China
| | - Carlo G Prato
- School of Civil Engineering, The University of Queensland, St. Lucia 4072, Brisbane, Australia.
| | - Yonggang Wang
- School of Highway Chang'an University Xi'an, Shann'xi, 710064 People's Republic of China
| |
Collapse
|
18
|
Abstract
Open data allows researchers to explore pre-existing datasets in new ways. However, if many researchers reuse the same dataset, multiple statistical testing may increase false positives. Here we demonstrate that sequential hypothesis testing on the same dataset by multiple researchers can inflate error rates. We go on to discuss a number of correction procedures that can reduce the number of false positives, and the challenges associated with these correction procedures.
Collapse
Affiliation(s)
- William Hedley Thompson
- Department of Psychology, Stanford UniversityStanfordUnited States
- Department of Clinical Neuroscience, Karolinska InstitutetStockholmSweden
| | - Jessey Wright
- Department of Psychology, Stanford UniversityStanfordUnited States
- Department of Philosophy, Stanford UniversityStanfordUnited States
| | | | | |
Collapse
|
19
|
Abstract
Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling assumption on the effect size distribution to achieve robust and accurate prediction performance across a range of genetic architectures. DBSLMM also relies on a simple deterministic search algorithm to yield an approximate analytic estimation solution using summary statistics only. The deterministic search algorithm, when paired with further algebraic innovations, results in substantial computational savings. With simulations, we show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. We then apply DBSLMM to analyze 25 traits in UK Biobank. For these traits, compared to existing approaches, DBSLMM achieves an average of 2.03%-101.09% accuracy gain in internal cross-validations. In external validations on two separate datasets, including one from BioBank Japan, DBSLMM achieves an average of 14.74%-522.74% accuracy gain. In these real data applications, DBSLMM is 1.03-28.11 times faster and uses only 7.4%-24.8% of physical memory as compared to other multiple regression-based PGS methods. Overall, DBSLMM represents an accurate and scalable method for constructing PGS in biobank scale datasets.
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
20
|
Meyerowitz EA, Vannier AGL, Friesen MGN, Schoenfeld S, Gelfand JA, Callahan MV, Kim AY, Reeves PM, Poznansky MC. Rethinking the role of hydroxychloroquine in the treatment of COVID-19. FASEB J 2020; 34:6027-6037. [PMID: 32350928 PMCID: PMC7267640 DOI: 10.1096/fj.202000919] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 04/20/2020] [Indexed: 02/07/2023]
Abstract
There are currently no proven or approved treatments for coronavirus disease 2019 (COVID-19). Early anecdotal reports and limited in vitro data led to the significant uptake of hydroxychloroquine (HCQ), and to lesser extent chloroquine (CQ), for many patients with this disease. As an increasing number of patients with COVID-19 are treated with these agents and more evidence accumulates, there continues to be no high-quality clinical data showing a clear benefit of these agents for this disease. Moreover, these agents have the potential to cause harm, including a broad range of adverse events including serious cardiac side effects when combined with other agents. In addition, the known and potent immunomodulatory effects of these agents which support their use in the treatment of auto-immune conditions, and provided a component in the original rationale for their use in patients with COVID-19, may, in fact, undermine their utility in the context of the treatment of this respiratory viral infection. Specifically, the impact of HCQ on cytokine production and suppression of antigen presentation may have immunologic consequences that hamper innate and adaptive antiviral immune responses for patients with COVID-19. Similarly, the reported in vitro inhibition of viral proliferation is largely derived from the blockade of viral fusion that initiates infection rather than the direct inhibition of viral replication as seen with nucleoside/tide analogs in other viral infections. Given these facts and the growing uncertainty about these agents for the treatment of COVID-19, it is clear that at the very least thoughtful planning and data collection from randomized clinical trials are needed to understand what if any role these agents may have in this disease. In this article, we review the datasets that support or detract from the use of these agents for the treatment of COVID-19 and render a data informed opinion that they should only be used with caution and in the context of carefully thought out clinical trials, or on a case-by-case basis after rigorous consideration of the risks and benefits of this therapeutic approach.
Collapse
Affiliation(s)
- Eric A. Meyerowitz
- Division of Infectious DiseasesMassachusetts General Hospital (MGH) and Harvard Medical School (HMS)BostonMAUSA
| | - Augustin G. L. Vannier
- Division of Infectious DiseasesMassachusetts General Hospital (MGH) and Harvard Medical School (HMS)BostonMAUSA
- Vaccine and Immunotherapy Center (VIC)MGH and HMSBostonMAUSA
| | - Morgan G. N. Friesen
- Division of Infectious DiseasesMassachusetts General Hospital (MGH) and Harvard Medical School (HMS)BostonMAUSA
- Vaccine and Immunotherapy Center (VIC)MGH and HMSBostonMAUSA
| | - Sara Schoenfeld
- Division of Allergy, Immunology and RheumatologyMGH and HMSBostonMAUSA
| | - Jeffrey A. Gelfand
- Division of Infectious DiseasesMassachusetts General Hospital (MGH) and Harvard Medical School (HMS)BostonMAUSA
- Vaccine and Immunotherapy Center (VIC)MGH and HMSBostonMAUSA
| | - Michael V. Callahan
- Division of Infectious DiseasesMassachusetts General Hospital (MGH) and Harvard Medical School (HMS)BostonMAUSA
- Vaccine and Immunotherapy Center (VIC)MGH and HMSBostonMAUSA
- Special Advisor to the Assistant Secretary of Public Health Preparedness and Response U.S Dept of Health and Human ServicesWashingtonDCUSA
| | - Arthur Y. Kim
- Division of Infectious DiseasesMassachusetts General Hospital (MGH) and Harvard Medical School (HMS)BostonMAUSA
| | - Patrick M. Reeves
- Division of Infectious DiseasesMassachusetts General Hospital (MGH) and Harvard Medical School (HMS)BostonMAUSA
- Vaccine and Immunotherapy Center (VIC)MGH and HMSBostonMAUSA
| | - Mark C. Poznansky
- Division of Infectious DiseasesMassachusetts General Hospital (MGH) and Harvard Medical School (HMS)BostonMAUSA
- Vaccine and Immunotherapy Center (VIC)MGH and HMSBostonMAUSA
| |
Collapse
|
21
|
Kim D, Makineni R, Panagiotou OA, Trivedi AN. Assessment of Completeness of Hospital Readmission Rates Reported in Medicare Advantage Contracts' Healthcare Effectiveness Data and Information Set. JAMA Netw Open 2020; 3:e203555. [PMID: 32343350 PMCID: PMC7189222 DOI: 10.1001/jamanetworkopen.2020.3555] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
This cross-sectional study evaluates the agreement between readmission rates reported by Medicare Advantage contracts and readmission rates calculated from their encounter data in the Healthcare Effectiveness Data and Information Set (HEDIS).
Collapse
Affiliation(s)
- Daeho Kim
- Department of Health Services, Policy, and Practice, Brown University, Providence, Rhode Island
| | - Rajesh Makineni
- Department of Health Services, Policy, and Practice, Brown University, Providence, Rhode Island
| | - Orestis A. Panagiotou
- Department of Health Services, Policy, and Practice, Brown University, Providence, Rhode Island
| | - Amal N. Trivedi
- Department of Health Services, Policy, and Practice, Brown University, Providence, Rhode Island
- Providence VA Medical Center, Providence, Rhode Island
| |
Collapse
|
22
|
Thayer D, Rees A, Kennedy J, Collins H, Harris D, Halcox J, Ruschetti L, Noyce R, Brooks C. Measuring follow-up time in routinely-collected health datasets: Challenges and solutions. PLoS One 2020; 15:e0228545. [PMID: 32045428 PMCID: PMC7012444 DOI: 10.1371/journal.pone.0228545] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 01/18/2020] [Indexed: 11/30/2022] Open
Abstract
A key requirement for longitudinal studies using routinely-collected health data is to be able to measure what individuals are present in the datasets used, and over what time period. Individuals can enter and leave the covered population of administrative datasets for a variety of reasons, including both life events and characteristics of the datasets themselves. An automated, customizable method of determining individuals' presence was developed for the primary care dataset in Swansea University's SAIL Databank. The primary care dataset covers only a portion of Wales, with 76% of practices participating. The start and end date of the data varies by practice. Additionally, individuals can change practices or leave Wales. To address these issues, a two step process was developed. First, the period for which each practice had data available was calculated by measuring changes in the rate of events recorded over time. Second, the registration records for each individual were simplified. Anomalies such as short gaps and overlaps were resolved by applying a set of rules. The result of these two analyses was a cleaned set of records indicating start and end dates of available primary care data for each individual. Analysis of GP records showed that 91.0% of events occurred within periods calculated as having available data by the algorithm. 98.4% of those events were observed at the same practice of registration as that computed by the algorithm. A standardized method for solving this common problem has enabled faster development of studies using this data set. Using a rigorous, tested, standardized method of verifying presence in the study population will also positively influence the quality of research.
Collapse
Affiliation(s)
- Daniel Thayer
- SAIL Databank, Swansea University Medical School, Swansea, United Kingdom
| | - Arfon Rees
- SAIL Databank, Swansea University Medical School, Swansea, United Kingdom
| | - Jon Kennedy
- Swansea University Medical School, Swansea, United Kingdom
| | - Huw Collins
- SAIL Databank, Swansea University Medical School, Swansea, United Kingdom
| | - Dan Harris
- Abertawe Bro Morgannwg University Health Board, Swansea, United Kingdom
| | - Julian Halcox
- Abertawe Bro Morgannwg University Health Board, Swansea, United Kingdom
| | - Luca Ruschetti
- SAIL Databank, Swansea University Medical School, Swansea, United Kingdom
| | - Richard Noyce
- SAIL Databank, Swansea University Medical School, Swansea, United Kingdom
| | - Caroline Brooks
- SAIL Databank, Swansea University Medical School, Swansea, United Kingdom
| |
Collapse
|
23
|
Xie Z, Deng X, Shu K. Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int J Mol Sci 2020; 21:E467. [PMID: 31940793 PMCID: PMC7013409 DOI: 10.3390/ijms21020467] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 12/23/2019] [Accepted: 01/08/2020] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interaction (PPI) sites play a key role in the formation of protein complexes, which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which has led to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI site prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under the curve (AUC) = 0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false-positive PPI sites in the positive samples defined by the distance between residue atoms.
Collapse
Affiliation(s)
- Zengyan Xie
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| | | | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| |
Collapse
|
24
|
Robinson J, Rosenzweig C, Moss AJ, Litman L. Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the Mechanical Turk participant pool. PLoS One 2019; 14:e0226394. [PMID: 31841534 PMCID: PMC6913990 DOI: 10.1371/journal.pone.0226394] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 11/25/2019] [Indexed: 11/18/2022] Open
Abstract
Mechanical Turk (MTurk) is a common source of research participants within the academic community. Despite MTurk’s utility and benefits over traditional subject pools some researchers have questioned whether it is sustainable. Specifically, some have asked whether MTurk workers are too familiar with manipulations and measures common in the social sciences, the result of many researchers relying on the same small participant pool. Here, we show that concerns about non-naivete on MTurk are due less to the MTurk platform itself and more to the way researchers use the platform. Specifically, we find that there are at least 250,000 MTurk workers worldwide and that a large majority of US workers are new to the platform each year and therefore relatively inexperienced as research participants. We describe how inexperienced workers are excluded from studies, in part, because of the worker reputation qualifications researchers commonly use. Then, we propose and evaluate an alternative approach to sampling on MTurk that allows researchers to access inexperienced participants without sacrificing data quality. We recommend that in some cases researchers should limit the number of highly experienced workers allowed in their study by excluding these workers or by stratifying sample recruitment based on worker experience levels. We discuss the trade-offs of different sampling practices on MTurk and describe how the above sampling strategies can help researchers harness the vast and largely untapped potential of the Mechanical Turk participant pool.
Collapse
Affiliation(s)
- Jonathan Robinson
- Department of Computer Science, Lander College, Flushing, New York, United States of America
- Prime Research Solutions, Queens, New York, United States of America
| | - Cheskie Rosenzweig
- Prime Research Solutions, Queens, New York, United States of America
- Department of Clinical Psychology, Columbia University, New York, New York, United States of America
| | - Aaron J. Moss
- Prime Research Solutions, Queens, New York, United States of America
| | - Leib Litman
- Prime Research Solutions, Queens, New York, United States of America
- Department of Psychology, Lander College, Flushing, New York, United States of America
- * E-mail:
| |
Collapse
|
25
|
Johnson K, Walsh H, Sasse A, Davis M, Buckley B, Greaves S, To A. New Zealand minimum dataset for a standard transthoracic echocardiogram. N Z Med J 2019; 132:81-89. [PMID: 31778376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Affiliation(s)
| | - Helen Walsh
- Cardiac Sonographer, CSANZ Echocardiography Working Group, WDHB, Whanganui
| | - Alex Sasse
- Cardiologist, CSANZ Echocardiography Working Group, CCDHB, Wellington
| | - Mark Davis
- Cardiologist, CSANZ Echocardiography Working Group, WDHB, Whanganui
| | - Belinda Buckley
- Cardiac Sonographer, CSANZ Echocardiography Working Group, CMDHB, Auckland
| | - Sally Greaves
- Cardiologist, CSANZ Echocardiography Working Group, ADHB, Auckland
| | - Andrew To
- Cardiologist, CSANZ Echocardiography Working Group, WDHB, Whanganui
| |
Collapse
|
26
|
Bartlett CW, Klamer BG, Buyske S, Petrill SA, Ray WC. Forming Big Datasets through Latent Class Concatenation of Imperfectly Matched Databases Features. Genes (Basel) 2019; 10:genes10090727. [PMID: 31546899 PMCID: PMC6771148 DOI: 10.3390/genes10090727] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 09/03/2019] [Accepted: 09/16/2019] [Indexed: 12/02/2022] Open
Abstract
Informatics researchers often need to combine data from many different sources to increase statistical power and study subtle or complicated effects. Perfect overlap of measurements across academic studies is rare since virtually every dataset is collected for a unique purpose and without coordination across parties not-at-hand (i.e., informatics researchers in the future). Thus, incomplete concordance of measurements across datasets poses a major challenge for researchers seeking to combine public databases. In any given field, some measurements are fairly standard, but every organization collecting data makes unique decisions on instruments, protocols, and methods of processing the data. This typically denies literal concatenation of the raw data since constituent cohorts do not have the same measurements (i.e., columns of data). When measurements across datasets are similar prima facie, there is a desire to combine the data to increase power, but mixing non-identical measurements could greatly reduce the sensitivity of the downstream analysis. Here, we discuss a statistical method that is applicable when certain patterns of missing data are found; namely, it is possible to combine datasets that measure the same underlying constructs (or latent traits) when there is only partial overlap of measurements across the constituent datasets. Our method, ROSETTA empirically derives a set of common latent trait metrics for each related measurement domain using a novel variation of factor analysis to ensure equivalence across the constituent datasets. The advantage of combining datasets this way is the simplicity, statistical power, and modeling flexibility of a single joint analysis of all the data. Three simulation studies show the performance of ROSETTA on datasets with only partially overlapping measurements (i.e., systematically missing information), benchmarked to a condition of perfectly overlapped data (i.e., full information). The first study examined a range of correlations, while the second study was modeled after the observed correlations in a well-characterized clinical, behavioral cohort. Both studies consistently show significant correlations >0.94, often >0.96, indicating the robustness of the method and validating the general approach. The third study varied within and between domain correlations and compared ROSETTA to multiple imputation and meta-analysis as two commonly used methods that ostensibly solve the same data integration problem. We provide one alternative to meta-analysis and multiple imputation by developing a method that statistically equates similar but distinct manifest metrics into a set of empirically derived metrics that can be used for analysis across all datasets.
Collapse
Affiliation(s)
- Christopher W Bartlett
- Battelle Center for Mathematical Medicine, Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH 43215, USA.
- Department of Pediatrics, College of Medicine, The Ohio State University, Columbus, OH 43215, USA.
| | - Brett G Klamer
- Battelle Center for Mathematical Medicine, Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH 43215, USA
| | - Steven Buyske
- Departments of Statistics and Genetics, Rutgers University, Piscataway, NJ 08854, USA
| | - Stephen A Petrill
- Department of Psychology, College of Arts and Sciences, The Ohio State University, Columbus, OH 43210, USA
| | - William C Ray
- Battelle Center for Mathematical Medicine, Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH 43215, USA
- Department of Pediatrics, College of Medicine, The Ohio State University, Columbus, OH 43215, USA
| |
Collapse
|
27
|
Seethala RR, Altemani A, Ferris RL, Fonseca I, Gnepp DR, Ha P, Nagao T, Skalova A, Stenman G, Thompson LDR. Data Set for the Reporting of Carcinomas of the Major Salivary Glands: Explanations and Recommendations of the Guidelines From the International Collaboration on Cancer Reporting. Arch Pathol Lab Med 2019; 143:578-586. [PMID: 30500293 DOI: 10.5858/arpa.2018-0422-sa] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The International Collaboration on Cancer Reporting is a nonprofit organization whose goal is to develop evidence-based, internationally agreed-upon standardized data sets for each anatomic site, to be used throughout the world. Providing global standardization of pathology tumor classification, staging, and other reporting elements will lead to achieving the objective of improved patient management and enhanced epidemiologic research. Salivary gland carcinomas are relatively uncommon, and as such, meaningful data about the many histologic types are not easily compared. Morphologic overlap between tumor types makes accurate classification challenging, but there are often significant differences in patient outcomes. Therefore, issues related to tumor type, tumor grading, high-grade transformation, extent of invasion, number and size of nerves affected, and types of ancillary studies are discussed in the context of daily application to specimens from these organs. This review focuses on the data set developed for salivary gland carcinomas with discussion of the key core and noncore elements developed for inclusion by an international expert panel of head and neck and oral-maxillofacial pathologists and surgeons.
Collapse
Affiliation(s)
- Raja R Seethala
- From the Department of Pathology and Laboratory Medicine (Dr Seethala) and Division of Head and Neck Surgery, Department of Otolaryngology (Dr Ferris), University of Pittsburgh, Pittsburgh, Pennsylvania; the Department of Anatomic Pathology, Faculty of Medical Sciences, University of Campinas, São Paulo, Brazil (Dr Altemani); the Pathological Anatomy Institute, Faculdade de Medicina, Universidade de Lisboa & Serviço de Anatomia Patológica, Instituto Português de Oncologia Francisco Gentil, Lisbon, Portugal (Dr Fonseca); Head and Neck Pathology, Rye Brook, New York (Dr Gnepp); the Department of Otolaryngology-Head and Neck Surgery, University of California, San Francisco (Dr Ha); the Department of Anatomic Pathology, Tokyo Medical University, Tokyo, Japan (Dr Nagao); the Department of Pathology, Faculty of Medicine in Plzen, Charles University, Plzen, Czech Republic (Dr Skalova); the Department of Pathology and Genetics, Sahlgrenska Cancer Center, University of Gothenburg, Gothenburg, Sweden (Dr Stenman); and the Southern California Permanente Medical Group, Woodland Hills Medical Center, Woodland Hills (Dr Thompson)
| | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Willison DJ, Trowbridge J, Greiver M, Keshavjee K, Mumford D, Sullivan F. Participatory governance over research in an academic research network: the case of Diabetes Action Canada. BMJ Open 2019; 9:e026828. [PMID: 31005936 PMCID: PMC6500288 DOI: 10.1136/bmjopen-2018-026828] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 02/15/2019] [Accepted: 02/18/2019] [Indexed: 11/04/2022] Open
Abstract
Digital data generated in the course of clinical care are increasingly being leveraged for a wide range of secondary purposes. Researchers need to develop governance policies that can assure the public that their information is being used responsibly. Our aim was to develop a generalisable model for governance of research emanating from health data repositories that will invoke the trust of the patients and the healthcare professionals whose data are being accessed for health research. We developed our governance principles and processes through literature review and iterative consultation with key actors in the research network including: a data governance working group, the lead investigators and patient advisors. We then recruited persons to participate in the governing and advisory bodies. Our governance process is informed by eight principles: (1) transparency; (2) accountability; (3) follow rule of law; (4) integrity; (5) participation and inclusiveness; (6) impartiality and independence; (7) effectiveness, efficiency and responsiveness and (8) reflexivity and continuous quality improvement. We describe the rationale for these principles, as well as their connections to the subsequent policies and procedures we developed. We then describe the function of the Research Governing Committee, the majority of whom are either persons living with diabetes or physicians whose data are being used, and the patient and data provider advisory groups with whom they consult and communicate. In conclusion, we have developed a values-based information governance framework and process for Diabetes Action Canada that adds value over-and-above existing scientific and ethics review processes by adding a strong patient perspective and contextual integrity. This model is adaptable to other secure data repositories.
Collapse
Affiliation(s)
- Donald J Willison
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Joslyn Trowbridge
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Michelle Greiver
- Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Family and Community Medicine, North York General Hospital, Toronto, Ontario, Canada
| | | | | | - Frank Sullivan
- Family and Community Medicine, North York General Hospital, Toronto, Ontario, Canada
- School of Medicine, University of St. Andrews, St Andrews, UK
| |
Collapse
|
29
|
Koczerginski J, Ho K, Golby R, Borycki EM, Kushniruk AW, Born J, Juhra C. Canadian Validation of German Medical Emergency Datasets. Stud Health Technol Inform 2019; 257:212-217. [PMID: 30741198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Medical Emergency Datasets (MEDs) are brief summarizations of an individual's medical history, providing vital patient information to emergency medical providers. A recent German study [1] evaluated whether MEDs are useful to local emergency physicians and paramedics, and which health data were relevant to their medical management. To validate of the German study internationally, Canadian physicians and paramedics were recruited to provide feedback on the utility of the German MEDs as well as their specific content. Original documents and surveys were translated to English directly, with a goal of collecting quantitative and qualitative feedback. Overall, physicians and paramedics found the MEDs to be useful in their evaluation of hypothetical medical scenarios. Most of the MED content was very useful, with some items appearing extraneous. The findings of this study will be used to inform future development of MEDs as well as to drive future research.
Collapse
Affiliation(s)
| | - Kendall Ho
- Faculty of Medicine, University of British Columbia
| | - Riley Golby
- Faculty of Medicine, University of British Columbia
| | | | | | | | | |
Collapse
|
30
|
Draeger CL, Akutsu RDCCDA, Araújo WMC, da Silva ICR, Botelho RBA, Zandonadi RP. Epidemiological Surveillance System on Foodborne Diseases in Brazil after 10-Years of Its Implementation: Completeness Evaluation. Int J Environ Res Public Health 2018; 15:E2284. [PMID: 30336631 PMCID: PMC6210259 DOI: 10.3390/ijerph15102284] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 10/10/2018] [Accepted: 10/15/2018] [Indexed: 11/16/2022]
Abstract
This study aimed to evaluate the data quality of the Brazilian Epidemiological Surveillance System on Foodborne Diseases (VE-DTA) through the evaluation of the completeness of the record after 10-years of its implementation. The study evaluated the measurement of completeness by quantifying ignored, incomplete or blank responses of the data items filled. The evaluation used the percentage of completion of these items regarding the total number of notifications registered in the system. We organized the results according to the general Category of completeness of the database, by year of notification and region of occurrence. We also evaluated the overall completeness percentages of the database and the completeness levels according to the degree of recommendation of completion of each variable (mandatory, essential, and complementary) by the VE-DTA manual. The system presented 7037 outbreaks of foodborne diseases. According to the completeness classification, the database presented general classification as Category 1 since it has 82.1% (n = 5.777) of variables with the level of completion up to 75.1%. We observed that 8.6% of the database was classified as category 2; 9.2% as category 3 and 0.1% as category 4. The improvement on database quality regarding completeness can positively impact on public health and public policies, reducing the number of FBDs deaths.
Collapse
Affiliation(s)
- Cainara Lins Draeger
- Department of Nutrition, Faculty of Health Sciences, University of Brasilia, Brasilia 70910-900, Brazil.
| | | | - Wilma Maria Coelho Araújo
- Department of Nutrition, Faculty of Health Sciences, University of Brasilia, Brasilia 70910-900, Brazil.
| | | | | | - Renata Puppin Zandonadi
- Department of Nutrition, Faculty of Health Sciences, University of Brasilia, Brasilia 70910-900, Brazil.
| |
Collapse
|
31
|
Delisle Nyström C, Barnes JD, Tremblay MS. An exploratory analysis of missing data from the Royal Bank of Canada (RBC) Learn to Play - Canadian Assessment of Physical Literacy (CAPL) project. BMC Public Health 2018; 18:1046. [PMID: 30285797 PMCID: PMC6167773 DOI: 10.1186/s12889-018-5901-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Physical literacy comprises a range of tests over four domains (Physical Competence, Daily Behaviour, Motivation and Confidence, and Knowledge and Understanding). The patterns of missing data in large field test batteries such as those for physical literacy are largely unknown. Therefore, the aim of this paper was to explore the patterns and possible reasons for missing data in the Royal Bank of Canada Learn to Play-Canadian Assessment of Physical Literacy (RBC Learn to Play-CAPL) project. METHODS A total of 10,034 Canadian children aged 8 to 12 years participated in the RBC Learn to Play-CAPL project. A 32-variable subset from the larger CAPL dataset was used for these analyses. Several R packages ("Hmisc", "mice", "VIM") were used to generate matrices and plots of missing data, and to perform multiple imputations. RESULTS Overall, the proportion of missing data for individual measures and domains ranged from 0.0 to 33.8%, with the average proportion of missing data being 4.0%. The largest proportion of missing data in CAPL was the pedometer step counts, followed by the components of the Physical Competence domain and the Children's Self-Perception of Adequacy in and Predilection for Physical Activity subscales. When domain scores were regressed on five imputed subsets with the original subset as the reference, there were small and statistically detectable differences in the Daily Behaviour score (β = - 1.6 to - 1.7, p < 0.001). However, for the other domain scores the differences were negligible and statistically undetectable (β = - 0.01 to - 0.06, p > 0.05). CONCLUSIONS This study has implications for other researchers or educators who are creating or using large field-based assessment measures in the areas of physical literacy, physical activity, or physical fitness, as this study demonstrates where problems in data collection can arise and how missing data can be avoided. When large proportions of missing data are present, imputation techniques, correction factors, or other treatment options may be required.
Collapse
Affiliation(s)
- Christine Delisle Nyström
- Healthy Active Living and Obesity (HALO) Research Group, Children’s Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON K1H 8L1 Canada
| | - Joel D. Barnes
- Healthy Active Living and Obesity (HALO) Research Group, Children’s Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON K1H 8L1 Canada
| | - Mark S. Tremblay
- Healthy Active Living and Obesity (HALO) Research Group, Children’s Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON K1H 8L1 Canada
| |
Collapse
|
32
|
Kar A, Corcoran P. Performance Evaluation Strategies for Eye Gaze Estimation Systems with Quantitative Metrics and Visualizations. Sensors (Basel) 2018; 18:s18093151. [PMID: 30231547 PMCID: PMC6165570 DOI: 10.3390/s18093151] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 09/07/2018] [Accepted: 09/15/2018] [Indexed: 11/16/2022]
Abstract
An eye tracker’s accuracy and system behavior play critical roles in determining the reliability and usability of eye gaze data obtained from them. However, in contemporary eye gaze research, there exists a lot of ambiguity in the definitions of gaze estimation accuracy parameters and lack of well-defined methods for evaluating the performance of eye tracking systems. In this paper, a set of fully defined evaluation metrics are therefore developed and presented for complete performance characterization of generic commercial eye trackers, when they operate under varying conditions on desktop or mobile platforms. In addition, some useful visualization methods are implemented, which will help in studying the performance and data quality of eye trackers irrespective of their design principles and application areas. Also the concept of a graphical user interface software named GazeVisual v1.1 is proposed that would integrate all these methods and enable general users to effortlessly access the described metrics, generate visualizations and extract valuable information from their own gaze datasets. We intend to present these tools as open resources in future to the eye gaze research community for use and further advancement, as a contribution towards standardization of gaze research outputs and analysis.
Collapse
Affiliation(s)
- Anuradha Kar
- Department of Electrical & Electronic Engineering, National University of Ireland, Galway H91 TK33, Ireland.
| | - Peter Corcoran
- Department of Electrical & Electronic Engineering, National University of Ireland, Galway H91 TK33, Ireland.
| |
Collapse
|
33
|
Lawler M, Morris AD, Sullivan R, Birney E, Middleton A, Makaroff L, Knoppers BM, Horgan D, Eggermont A. A roadmap for restoring trust in Big Data. Lancet Oncol 2018; 19:1014-1015. [PMID: 30102210 DOI: 10.1016/s1470-2045(18)30425-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 05/15/2018] [Accepted: 05/29/2018] [Indexed: 01/12/2023]
Affiliation(s)
- Mark Lawler
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast BT9 7BL, UK; European Alliance for Personalised Medicine, Brussels, Belgium; Global Alliance for Genomics and Health, Boston, MA, USA; Health Data Research UK, London, UK.
| | | | | | - Ewan Birney
- Global Alliance for Genomics and Health, Boston, MA, USA; European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Anna Middleton
- Global Alliance for Genomics and Health, Boston, MA, USA; Welcome Genome Campus, Society and Ethics Research, Cambridge, UK
| | - Lydia Makaroff
- European Cancer Patient Coalition, Brussels, Belgium; University of Leuven, Leuven, Belgium
| | - Bartha M Knoppers
- Global Alliance for Genomics and Health, Boston, MA, USA; Centre for Genomics and Policy, McGill University, Montreal, QC, Canada
| | - Denis Horgan
- European Alliance for Personalised Medicine, Brussels, Belgium
| | - Alexander Eggermont
- European Alliance for Personalised Medicine, Brussels, Belgium; Gustave Roussy Cancer Campus Grand Paris, Villejuif, France
| |
Collapse
|
34
|
Abraha I, Montedori A, Serraino D, Orso M, Giovannini G, Scotti V, Granata A, Cozzolino F, Fusco M, Bidoli E. Accuracy of administrative databases in detecting primary breast cancer diagnoses: a systematic review. BMJ Open 2018; 8:e019264. [PMID: 30037859 PMCID: PMC6059263 DOI: 10.1136/bmjopen-2017-019264] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
OBJECTIVE To define the accuracy of administrative datasets to identify primary diagnoses of breast cancer based on the International Classification of Diseases (ICD) 9th or 10th revision codes. DESIGN Systematic review. DATA SOURCES MEDLINE, EMBASE, Web of Science and the Cochrane Library (April 2017). ELIGIBILITY CRITERIA The inclusion criteria were: (a) the presence of a reference standard; (b) the presence of at least one accuracy test measure (eg, sensitivity) and (c) the use of an administrative database. DATA EXTRACTION Eligible studies were selected and data extracted independently by two reviewers; quality was assessed using the Standards for Reporting of Diagnostic accuracy criteria. DATA ANALYSIS Extracted data were synthesised using a narrative approach. RESULTS From 2929 records screened 21 studies were included (data collection period between 1977 and 2011). Eighteen studies evaluated ICD-9 codes (11 of which assessed both invasive breast cancer (code 174.x) and carcinoma in situ (ICD-9 233.0)); three studies evaluated invasive breast cancer-related ICD-10 codes. All studies except one considered incident cases.The initial algorithm results were: sensitivity ≥80% in 11 of 17 studies (range 57%-99%); positive predictive value was ≥83% in 14 of 19 studies (range 15%-98%) and specificity ≥98% in 8 studies. The combination of the breast cancer diagnosis with surgical procedures, chemoradiation or radiation therapy, outpatient data or physician claim may enhance the accuracy of the algorithms in some but not all circumstances. Accuracy for breast cancer based on outpatient or physician's data only or breast cancer diagnosis in secondary position diagnosis resulted low. CONCLUSION Based on the retrieved evidence, administrative databases can be employed to identify primary breast cancer. The best algorithm suggested is ICD-9 or ICD-10 codes located in primary position. TRIAL REGISTRATION NUMBER CRD42015026881.
Collapse
Affiliation(s)
- Iosief Abraha
- Health Planning Service, Regional Health Authority of Umbria, Perugia, Italy
- Innovation and Development, Agenzia Nazionale per i Servizi Sanitari Regionali (Age.Na.S.), Rome, Italy
| | | | - Diego Serraino
- Cancer Epidemiology Unit, IRCCS Centro di Riferimento Oncologico Aviano, Aviano, Italy
| | - Massimiliano Orso
- Health Planning Service, Regional Health Authority of Umbria, Perugia, Italy
- Innovation and Development, Agenzia Nazionale per i Servizi Sanitari Regionali (Age.Na.S.), Rome, Italy
| | - Gianni Giovannini
- Health Planning Service, Regional Health Authority of Umbria, Perugia, Italy
| | - Valeria Scotti
- Center for Scientific Documentation, IRCCS Policlinico S. Matteo Foundation, Pavia, Italy
| | - Annalisa Granata
- Registro Tumori Regione Campania, ASL Napoli 3 Sud, Brusciano, Italy
| | - Francesco Cozzolino
- Health Planning Service, Regional Health Authority of Umbria, Perugia, Italy
| | - Mario Fusco
- Registro Tumori Regione Campania, ASL Napoli 3 Sud, Brusciano, Italy
| | - Ettore Bidoli
- Cancer Epidemiology Unit, IRCCS Centro di Riferimento Oncologico Aviano, Aviano, Italy
| |
Collapse
|
35
|
Hurley PD, Oliver S, Mehta A. Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy. PLoS One 2018; 13:e0199815. [PMID: 29985939 PMCID: PMC6037350 DOI: 10.1371/journal.pone.0199815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 06/14/2018] [Indexed: 11/18/2022] Open
Abstract
Patient registry data are commonly collected as annual snapshots that need to be amalgamated to understand the longitudinal progress of each patient. However, patient identifiers can either change or may not be available for legal reasons when longitudinal data are collated from patients living in different countries. Here, we apply astronomical statistical matching techniques to link individual patient records that can be used where identifiers are absent or to validate uncertain identifiers. We adopt a Bayesian model framework used for probabilistically linking records in astronomy. We adapt this and validate it across blinded, annually collected data. This is a high-quality (Danish) sub-set of data held in the European Cystic Fibrosis Society Patient Registry (ECFSPR). Our initial experiments achieved a precision of 0.990 at a recall value of 0.987. However, detailed investigation of the discrepancies uncovered typing errors in 27 of the identifiers in the original Danish sub-set. After fixing these errors to create a new gold standard our algorithm correctly linked individual records across years achieving a precision of 0.997 at a recall value of 0.987 without recourse to identifiers. Our Bayesian framework provides the probability of whether a pair of records belong to the same patient. Unlike other record linkage approaches, our algorithm can also use physical models, such as body mass index curves, as prior information for record linkage. We have shown our framework can create longitudinal samples where none existed and validate pre-existing patient identifiers. We have demonstrated that in this specific case this automated approach is better than the existing identifiers.
Collapse
Affiliation(s)
- Peter Donald Hurley
- Department of Physics and Astronomy, University of Sussex, Brighton, United Kingdom
| | - Seb Oliver
- Department of Physics and Astronomy, University of Sussex, Brighton, United Kingdom
| | - Anil Mehta
- Division of Medical Sciences, University of Dundee, Dundee, United Kingdom
| |
Collapse
|
36
|
VanderWeele J, Pollack T, Oakes DJ, Smyrniotis C, Illuri V, Vellanki P, O'Leary K, Holl J, Aleppo G, Molitch ME, Wallia A. Validation of data from electronic data warehouse in diabetic ketoacidosis: Caution is needed. J Diabetes Complications 2018; 32:650-654. [PMID: 29903409 DOI: 10.1016/j.jdiacomp.2018.05.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 04/05/2018] [Accepted: 05/02/2018] [Indexed: 11/17/2022]
Abstract
AIMS This study validated enterprise data warehouse (EDW) data for a cohort of hospitalized patients with a primary diagnosis of diabetic ketoacidosis (DKA). METHODS 247 patients with 319 admissions for DKA (ICD-9 code 250.12, 250.13, or 250.xx with biochemical criteria for DKA) were admitted to Northwestern Memorial Hospital from 1/1/2010 to 9/1/2013. Validation was performed by electronic medical record (EMR) review of 10% of admissions (N = 32). Classification of diabetes type (Type 1 vs. Type 2) and DKA clinical status were compared between the EMR review and EDW data. RESULTS Key findings included incorrect classification of diabetes type in 5 of 32 (16%) admissions and indeterminable classification in 5 admissions. DKA was not present, based on the review, in 11 of 32 (34%) admissions. DKA was not present, based on biochemical criteria, in 15 of 32 (47%) admissions. CONCLUSIONS This study found that EDW data have substantial errors. Some discrepancies can be addressed by refining the EDW query code, while others, related to diabetes classification and DKA diagnosis, cannot be corrected without improving clinical coding accuracy, consistency of medical record documentation, or EMR design. These results support the need for comprehensive validation of data for complex clinical populations obtained through data repositories such as the EDW.
Collapse
Affiliation(s)
- Jennifer VanderWeele
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States
| | - Teresa Pollack
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States
| | - Diana Johnson Oakes
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States
| | - Colleen Smyrniotis
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States
| | - Vidhya Illuri
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States
| | - Priyathama Vellanki
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States
| | - Kevin O'Leary
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Hospital Medicine, 211 E Ontario, Ste. 700, Chicago, IL 60611, United States
| | - Jane Holl
- Northwestern University Feinberg School of Medicine, Center for Healthcare Studies, Institute for Public Health and Medicine, 633 N Saint Clair, Ste. 2000, Chicago, IL 60611, United States
| | - Grazia Aleppo
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States
| | - Mark E Molitch
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States
| | - Amisha Wallia
- Northwestern University Feinberg School of Medicine, Department of Medicine, Division of Endocrinology, Metabolism, and Molecular Medicine, 300 E Superior, Ste. 15-703, Chicago, IL 60611, United States; Northwestern University Feinberg School of Medicine, Center for Healthcare Studies, Institute for Public Health and Medicine, 633 N Saint Clair, Ste. 2000, Chicago, IL 60611, United States.
| |
Collapse
|
37
|
Abstract
Reported values for concentrations of micronutrients in human milk form the basis of the majority of micronutrient intake recommendations for infants and the additional maternal requirements for lactation. The infant recommendations may also be extrapolated to provide estimates for young children. The purpose of this review is to evaluate the adequacy of the milk micronutrient concentration data used by the Institute of Medicine to set recommendations for the United States and Canada, by FAO/WHO, the United Kingdom, and the European Food Safety Authority. The concentrations accepted by each agency are presented for each micronutrient accompanied by the source of information and comments on the number, location, status, and stage of lactation of the sample population, where known. These summaries show the small number of participants from which samples were collected in most studies, the wide range of concentrations within studies, the lack of longitudinal data, and the variability in collection methods. These factors contribute to the variability in nutrient intake recommendations among committees, although this variability is reduced by some committees that accept milk-composition values proposed by others. Values are also summarized from milk collected in studies in which mothers or infants were known to be deficient on the basis of clinical symptoms, biomarkers of inadequacy, or both, to show the extent to which milk micronutrients can be reduced by poor maternal nutritional status. We conclude that a new, multicenter study is needed to establish reference values for milk constituents across lactation.
Collapse
Affiliation(s)
- Lindsay H Allen
- US Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA
| | - Juliana A Donohue
- US Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA
| | - Daphna K Dror
- US Department of Agriculture, Agricultural Research Service, Western Human Nutrition Research Center, Davis, CA
| |
Collapse
|
38
|
Lee K, Weiskopf N, Pathak J. A Framework for Data Quality Assessment in Clinical Research Datasets. AMIA Annu Symp Proc 2018; 2017:1080-1089. [PMID: 29854176 PMCID: PMC5977591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The wide availability of electronic health record (EHR) data for multi-institutional clinical research relies on accurately defined patient cohorts to ensure validity, especially when used in conjunction with open-access research data. There is a growing need to utilize a consensus-driven approach to assess data quality. To achieve this goal, we modified an existing data quality assessment (DQA) framework by re-operationalizing dimensions of quality for a clinical domain of interest - heart failure. We then created an inventory of common phenotype data elements (CPDEs) derived from open-access datasets and evaluated it against the modified DQA framework. We measured our inventory of CPDEs for Conformance, Completeness, and Plausibility. DQA scores were high on Completeness, Value Conformance, and Atemporal and Temporal Plausibility. Our work exhibits a generalizable approach to DQA for clinical research. Future work will 1) map datasets to standard terminologies and 2) create a quantitative DQA tool for research datasets.
Collapse
|
39
|
Ventura SJ. The U.S. National Vital Statistics System: Transitioning Into the 21st Century, 1990-2017. Vital Health Stat 1 2018:1-84. [PMID: 30248018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This report describes the history of the National Vital Statistics System, with a focus on the period 1990-2017. The vital statistics system is the country's most enduring program of data collection on the health of the population. It is based on information reported on the certificates of births and deaths and reports of fetal deaths, collected in each of the states and independent registration areas. Over the last two decades, the vital statistics system has experienced far-reaching changes, and has shifted in important ways to emphasize data quality, timeliness, and analysis. The changes underlying these areas are described.
Collapse
|
40
|
Alfaro-Almagro F, Jenkinson M, Bangerter NK, Andersson JLR, Griffanti L, Douaud G, Sotiropoulos SN, Jbabdi S, Hernandez-Fernandez M, Vallee E, Vidaurre D, Webster M, McCarthy P, Rorden C, Daducci A, Alexander DC, Zhang H, Dragonu I, Matthews PM, Miller KL, Smith SM. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 2018; 166:400-424. [PMID: 29079522 PMCID: PMC5770339 DOI: 10.1016/j.neuroimage.2017.10.034] [Citation(s) in RCA: 650] [Impact Index Per Article: 108.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2017] [Revised: 10/16/2017] [Accepted: 10/16/2017] [Indexed: 12/22/2022] Open
Abstract
UK Biobank is a large-scale prospective epidemiological study with all data accessible to researchers worldwide. It is currently in the process of bringing back 100,000 of the original participants for brain, heart and body MRI, carotid ultrasound and low-dose bone/fat x-ray. The brain imaging component covers 6 modalities (T1, T2 FLAIR, susceptibility weighted MRI, Resting fMRI, Task fMRI and Diffusion MRI). Raw and processed data from the first 10,000 imaged subjects has recently been released for general research access. To help convert this data into useful summary information we have developed an automated processing and QC (Quality Control) pipeline that is available for use by other researchers. In this paper we describe the pipeline in detail, following a brief overview of UK Biobank brain imaging and the acquisition protocol. We also describe several quantitative investigations carried out as part of the development of both the imaging protocol and the processing pipeline.
Collapse
Affiliation(s)
- Fidel Alfaro-Almagro
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK.
| | - Mark Jenkinson
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Neal K Bangerter
- Electrical and Computer Engineering, Brigham Young University, UT, USA
| | - Jesper L R Andersson
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Ludovica Griffanti
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Gwenaëlle Douaud
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Stamatios N Sotiropoulos
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK; Sir Peter Mansfield Imaging Centre, School of Medicine, University of Nottingham, UK
| | - Saad Jbabdi
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Moises Hernandez-Fernandez
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Emmanuel Vallee
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Diego Vidaurre
- Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford, UK
| | - Matthew Webster
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Paul McCarthy
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Christopher Rorden
- Department of Psychology and McCausland Center for Brain Imaging, University of South Carolina, SC, USA
| | - Alessandro Daducci
- Computer Science Department, University of Verona, Italy; Radiology Department, University Hospital Center, Switzerland
| | - Daniel C Alexander
- Centre for Medical Image Computing, Department of Computer Science, University College London, UK
| | - Hui Zhang
- Centre for Medical Image Computing, Department of Computer Science, University College London, UK
| | | | - Paul M Matthews
- Division of Brain Sciences, Imperial College, London, UK; UK Dementia Research Institute, London, UK
| | - Karla L Miller
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Stephen M Smith
- Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| |
Collapse
|
41
|
Wasnik V. Issues in data expansion in understanding criticality in biological systems. Eur Phys J E Soft Matter 2018; 41:13. [PMID: 29380087 DOI: 10.1140/epje/i2018-11621-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Accepted: 01/09/2018] [Indexed: 06/07/2023]
Abstract
At the point of a second-order phase transition also termed as a critical point, systems display long-range order and their macroscopic behaviors are independent of the microscopic details making up the system. Due to these properties, it has long been speculated that biological systems that show similar behavior despite having very different microscopics, may be operating near a critical point. Recent methods in neuroscience are making it possible to explore whether criticality exists in neural networks. Despite being large in size, many datasets are only a minute sample of the neural system and methods have to be developed to expand these datasets to study criticality. In this work we develop an analytical method of expanding a dataset to the large N limit to make statements about the critical nature of the dataset. We show that different ways of expanding the dataset while keeping its variance and mean fixed yield different results regarding criticality. This hence casts doubts on the established procedures for deducing criticality of biological systems through expansion of finite-sized datasets.
Collapse
Affiliation(s)
- Vaibhav Wasnik
- Department of Biochemistry, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
42
|
Abstract
IMPORTANCE Publicly available data sets hold much potential, but their unique design may require specific analytic approaches. OBJECTIVE To determine adherence to appropriate research practices for a frequently used large public database, the National Inpatient Sample (NIS) of the Agency for Healthcare Research and Quality (AHRQ). DESIGN, SETTING, AND PARTICIPANTS In this observational study of the 1082 studies published using the NIS from January 2015 through December 2016, a representative sample of 120 studies was systematically evaluated for adherence to practices required by AHRQ for the design and conduct of research using the NIS. EXPOSURES None. MAIN OUTCOMES AND MEASURES All studies were evaluated on 7 required research practices based on AHRQ's recommendations and compiled under 3 domains: (1) data interpretation (interpreting data as hospitalization records rather than unique patients); (2) research design (avoiding use in performing state-, hospital-, and physician-level assessments where inappropriate; not using nonspecific administrative secondary diagnosis codes to study in-hospital events); and (3) data analysis (accounting for complex survey design of the NIS and changes in data structure over time). RESULTS Of 120 published studies, 85% (n = 102) did not adhere to 1 or more required practices and 62% (n = 74) did not adhere to 2 or more required practices. An estimated 925 (95% CI, 852-998) NIS publications did not adhere to 1 or more required practices and 696 (95% CI, 596-796) NIS publications did not adhere to 2 or more required practices. A total of 79 sampled studies (68.3% [95% CI, 59.3%-77.3%]) among the 1082 NIS studies screened for eligibility did not account for the effects of sampling error, clustering, and stratification; 62 (54.4% [95% CI, 44.7%-64.0%]) extrapolated nonspecific secondary diagnoses to infer in-hospital events; 45 (40.4% [95% CI, 30.9%-50.0%]) miscategorized hospitalizations as individual patients; 10 (7.1% [95% CI, 2.1%-12.1%]) performed state-level analyses; and 3 (2.9% [95% CI, 0.0%-6.2%]) reported physician-level volume estimates. Of 27 studies (weighted; 218 studies [95% CI, 134-303]) spanning periods of major changes in the data structure of the NIS, 21 (79.7% [95% CI, 62.5%-97.0%]) did not account for the changes. Among the 24 studies published in journals with an impact factor of 10 or greater, 16 (67%) did not adhere to 1 or more practices, and 9 (38%) did not adhere to 2 or more practices. CONCLUSIONS AND RELEVANCE In this study of 120 recent publications that used data from the NIS, the majority did not adhere to required practices. Further research is needed to identify strategies to improve the quality of research using the NIS and assess whether there are similar problems with use of other publicly available data sets.
Collapse
Affiliation(s)
- Rohan Khera
- Division of Cardiology, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Suveen Angraal
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut
| | - Tyler Couch
- Division of Cardiology, University of Texas Southwestern Medical Center, Dallas, Texas
| | - John W. Welsh
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut
| | - Brahmajee K. Nallamothu
- Division of Cardiology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Saket Girotra
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Iowa Carver College of Medicine, Iowa City, Iowa
| | - Paul S. Chan
- Saint Luke's Mid America Heart and Vascular Institute and the University of Missouri-Kansas City, Kansas City, Missouri
| | - Harlan M. Krumholz
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine and the Department of Health Policy and Management, Yale School of Public Health, New Haven, Connecticut
| |
Collapse
|
43
|
Abstract
Objectives: To summarize recent research and emerging trends in the area of secondary use of healthcare data, and to present the best papers published in this field, selected to appear in the 2017 edition of the IMIA Yearbook. Methods: A literature review of articles published in 2016 and related to secondary use of healthcare data was performed using two bibliographic databases. From this search, 941 papers were identified. The section editors independently reviewed the papers for relevancy and impact, resulting in a consensus list of 14 candidate best papers. External reviewers examined each of the candidate best papers and the final selection was made by the editorial board of the Yearbook. Results: From the 941 retrieved papers, the selection process resulted in four best papers. These papers discuss data quality concerns, issues in preserving privacy of patients in shared datasets, and methods of decision support when consuming large amounts of raw electronic health record (EHR) data. Conclusion: In 2016, a significant effort was put into the development of new systems which aim to avoid significant human understanding and pre-processing of healthcare data, though this is still only an emerging area of research. The value of temporal relationships between data received significant study, as did effective information sharing while preserving patient privacy.
Collapse
|
44
|
Pinborg A. [The International Committee of Medical Journal Editors is coming to Copenhagen]. Ugeskr Laeger 2017; 179:V69397. [PMID: 28874247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
|
45
|
Peyvandi F, Makris M, Collins P, Lillicrap D, Pipe SW, Iorio A, Rosendaal FR. Minimal dataset for post-registration surveillance of new drugs in hemophilia: communication from the SSC of the ISTH. J Thromb Haemost 2017; 15:1878-1881. [PMID: 28767195 DOI: 10.1111/jth.13762] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Indexed: 11/27/2022]
Affiliation(s)
- F Peyvandi
- Angelo Bianchi Bonomi Hemophilia and Thrombosis Center, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Luigi Villa Foundation, Milan, Italy
- Department of Pathophysiology and Transplantation, Università degli Studi di Milano, Milan, Italy
| | - M Makris
- Sheffield Haemophilia and Thrombosis Centre, Royal Hallamshire Hospital, Sheffield, UK
| | - P Collins
- Arthur Bloom Haemophilia Centre, School of Medicine, Cardiff University, Cardiff, UK
| | - D Lillicrap
- Department of Pathology and Molecular Medicine, Queen's University, Kingston, Canada
| | - S W Pipe
- Pediatrics and Pathology, University of Michigan, Ann Arbor, MI, USA
| | - A Iorio
- Department of Health Research Methods, Evidence, and Impact, and Department of Medicine, McMaster University, Hamilton, Canada
| | - F R Rosendaal
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
46
|
Demirci MDS, Allmer J. Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection. J Integr Bioinform 2017; 14:/j/jib.2017.14.issue-2/jib-2017-0032/jib-2017-0032.xml. [PMID: 28753538 PMCID: PMC6042829 DOI: 10.1515/jib-2017-0032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 05/28/2017] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open
Abstract
MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins.
Collapse
Affiliation(s)
| | - Jens Allmer
- Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, Turkey
| |
Collapse
|
47
|
Taichman DB, Sahni P, Pinborg A, Peiperl L, Laine C, James A, Hong ST, Haileamlak A, Gollogly L, Godlee F, Frizelle FA, Florenzano F, Drazen JM, Bauchner H, Baethge C, Backus J. Data Sharing Statements for Clinical Trials: A Requirement of the International Committee of Medical Journal Editors. J Korean Med Sci 2017; 32:1051-1053. [PMID: 28581257 PMCID: PMC5461304 DOI: 10.3346/jkms.2017.32.7.1051] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2017] [Accepted: 05/01/2017] [Indexed: 11/20/2022] Open
Affiliation(s)
- Darren B Taichman
- Secretary, ICMJE, Executive Deputy Editor, Annals of Internal Medicine.
| | - Peush Sahni
- Representative and Past President, World Association of Medical Editors
| | - Anja Pinborg
- Scientific Editor-in-Chief, Ugeskrift for Laeger (Danish Medical Journal)
| | | | | | | | | | | | - Laragh Gollogly
- Editor, Bulletin of the World Health Organization, Coordinator, WHO Press
| | - Fiona Godlee
- Editor-in-Chief, The British Medical Journal (BMJ)
| | | | | | | | - Howard Bauchner
- Editor-in-Chief, Journal of the American Medical Association (JAMA) and the JAMA Network
| | - Christopher Baethge
- Chief Scientific Editor, Deutsches Ärzteblatt (German Medical Journal) & Deutsches Ärzteblatt International
| | - Joyce Backus
- Representative and Associate Director for Library Operations, National Library of Medicine
| |
Collapse
|
48
|
Simovski B, Vodák D, Gundersen S, Domanska D, Azab A, Holden L, Holden M, Grytten I, Rand K, Drabløs F, Johansen M, Mora A, Lund-Andersen C, Fromm B, Eskeland R, Gabrielsen OS, Ferkingstad E, Nakken S, Bengtsen M, Nederbragt AJ, Thorarensen HS, Akse JA, Glad I, Hovig E, Sandve GK. GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome. Gigascience 2017; 6:1-12. [PMID: 28459977 PMCID: PMC5493745 DOI: 10.1093/gigascience/gix032] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 01/17/2017] [Accepted: 04/24/2017] [Indexed: 12/01/2022] Open
Abstract
Background Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. Findings We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Conclusions Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.
Collapse
Affiliation(s)
- Boris Simovski
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Daniel Vodák
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | | | - Diana Domanska
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Abdulrahman Azab
- Department of Informatics, University of Oslo, Oslo, Norway
- Research Support Services Group, University Center for Information Technology, Oslo, Norway
| | - Lars Holden
- Statistics For Innovation, Norwegian Computing Center, Oslo, Norway
| | - Marit Holden
- Statistics For Innovation, Norwegian Computing Center, Oslo, Norway
| | - Ivar Grytten
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Knut Rand
- Department of Mathematics, University of Oslo, Oslo, Norway
| | - Finn Drabløs
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Morten Johansen
- Institute for Medical Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | - Antonio Mora
- Department of Informatics, University of Oslo, Oslo, Norway
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Christin Lund-Andersen
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Bastian Fromm
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Ragnhild Eskeland
- Department of Biosciences, University of Oslo, Oslo, Norway
- Norwegian Center for Stem Cell Research, Department of Immunology, Oslo University Hospital, Oslo, Norway
| | | | | | - Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Mads Bengtsen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Alexander Johan Nederbragt
- Department of Informatics, University of Oslo, Oslo, Norway
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | | | | | - Ingrid Glad
- Department of Mathematics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Statistics For Innovation, Norwegian Computing Center, Oslo, Norway
- Institute for Medical Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | | |
Collapse
|
49
|
Taichman DB, Sahni P, Pinborg A, Peiperl L, Laine C, James A, Hong ST, Haileamlak A, Gollogly L, Godlee F, Frizelle FA, Florenzano F, Drazen JM, Bauchner H, Baethge C, Backus J. Data Sharing Statements for Clinical Trials: A Requirement of the International Committee of Medical Journal Editors. JAMA 2017; 317:2491-2492. [PMID: 28586895 DOI: 10.1001/jama.2017.6514] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
- Darren B Taichman
- Secretary, ICMJE, and Executive Deputy Editor, Annals of Internal Medicine
| | - Peush Sahni
- Representative and Past President, World Association of Medical Editors
| | - Anja Pinborg
- Scientific Editor-in-Chief, Ugeskrift for Laeger (Danish Medical Journal)
| | | | | | | | | | | | - Laragh Gollogly
- Editor, Bulletin of the World Health Organization, and Coordinator, WHO Press
| | - Fiona Godlee
- Editor-in-Chief, The BMJ (British Medical Journal)
| | | | | | | | | | - Christopher Baethge
- Chief Scientific Editor, Deutsches Ärzteblatt (German Medical Journal) and Deutsches Ärzteblatt International
| | - Joyce Backus
- Representative and Associate Director for Library Operations, National Library of Medicine
| |
Collapse
|
50
|
Taichman DB, Sahni P, Pinborg A, Peiperl L, Laine C, James A, Hong ST, Haileamlak A, Gollogly L, Godlee F, Frizelle FA, Florenzano F, Drazen JM, Bauchner H, Baethge C, Backus J. Data sharing statements for clinical trials: a requirement of the International Committee of Medical Journal Editors. N Z Med J 2017; 130:7-10. [PMID: 28617782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Affiliation(s)
- Darren B Taichman
- M.D., Ph.D., Secretary, ICMJE, Executive Deputy Editor, Annals of Internal Medicine
| | - Peush Sahni
- M.B., B.S., M.S., Ph.D., Representative and Past President, World Association of Medical Editors
| | - Anja Pinborg
- M.D., Scientific Editor-in-Chief, Ugeskrift for Laeger (Danish Medical Journal)
| | | | | | | | - Sung-Tae Hong
- M.D., Ph.D., Editor-in-Chief, Journal of Korean Medical Science
| | | | - Laragh Gollogly
- M.D., M.P.H., Editor, Bulletin of the World Health Organization, Coordinator, WHO Press
| | - Fiona Godlee
- F.R.C.P., Editor-in-Chief, The BMJ (British Medical Journal)
| | - Frank A Frizelle
- M.B., Ch.B., F.R.A.C.S., Editor-in-Chief, New Zealand Medical Journal
| | | | | | - Howard Bauchner
- M.D., Editor-in-Chief, JAMA (Journal of the American Medical Association) and the JAMA Network
| | - Christopher Baethge
- M.D., Chief Scientific Editor, Deutsches Ärzteblatt (German Medical Journal) & Deutsches Ärzteblatt International
| | - Joyce Backus
- M.S.L.S., Representative and Associate Director for Library Operations, National Library of Medicine
| |
Collapse
|