Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jones KH, Ford EM, Lea N, Griffiths LJ, Hassan L, Heys S, Squires E, Nenadic G. Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper. J Med Internet Res 2020;22:e16760. [PMID: 32597785 PMCID: PMC7367542 DOI: 10.2196/16760] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/06/2020] [Accepted: 03/23/2020] [Indexed: 01/17/2023] Open

For:	Jones KH, Ford EM, Lea N, Griffiths LJ, Hassan L, Heys S, Squires E, Nenadic G. Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper. J Med Internet Res 2020;22:e16760. [PMID: 32597785 PMCID: PMC7367542 DOI: 10.2196/16760] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/06/2020] [Accepted: 03/23/2020] [Indexed: 01/17/2023] Open

Number

Cited by Other Article(s)

Sedlakova J, Daniore P, Horn Wintsch A, Wolf M, Stanikic M, Haag C, Sieber C, Schneider G, Staub K, Alois Ettlin D, Grübner O, Rinaldi F, von Wyl V. Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review. PLOS DIGITAL HEALTH 2023;2:e0000347. [PMID: 37819910 PMCID: PMC10566734 DOI: 10.1371/journal.pdig.0000347] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 08/14/2023] [Indexed: 10/13/2023]

Abstract

Digital data play an increasingly important role in advancing health research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Unstructured data are often found in a format that lacks standardization and needs significant preprocessing and feature extraction efforts. This poses challenges when combining such data with other data sources to enhance the existing knowledge base, which we refer to as digital unstructured data enrichment. Overcoming these methodological challenges requires significant resources and may limit the ability to fully leverage their potential for advancing health research and, ultimately, prevention, and patient care delivery. While prevalent challenges associated with unstructured data use in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with structured data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health, along with possible solutions to address these challenges. Based on these findings, we developed a checklist that follows the standard data flow in health research studies. This checklist aims to provide initial systematic guidance to inform early planning and feasibility assessments for health research studies aiming combining unstructured data with existing data sources. Overall, the generality of reported unstructured data enrichment methods in the studies included in this review call for more systematic reporting of such methods to achieve greater reproducibility in future studies.

Collapse

Affiliation(s)

Jana Sedlakova Digital Society Initiative, University of Zurich, Zurich, Switzerland Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland Institute of Biomedical Ethics and History of Medicine, University of Zurich, Zurich, Switzerland
Paola Daniore Digital Society Initiative, University of Zurich, Zurich, Switzerland Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
Andrea Horn Wintsch Digital Society Initiative, University of Zurich, Zurich, Switzerland Center for Gerontology, University of Zurich, Zurich, Switzerland CoupleSense: Health and Interpersonal Emotion Regulation Group, University Research Priority Program (URPP) Dynamics of Healthy Aging, University of Zurich, Zurich, Switzerland
Markus Wolf Digital Society Initiative, University of Zurich, Zurich, Switzerland Department of Psychology, University of Zurich, Zurich, Switzerland
Mina Stanikic Digital Society Initiative, University of Zurich, Zurich, Switzerland Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
Christina Haag Digital Society Initiative, University of Zurich, Zurich, Switzerland Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
Chloé Sieber Digital Society Initiative, University of Zurich, Zurich, Switzerland Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
Gerold Schneider Digital Society Initiative, University of Zurich, Zurich, Switzerland Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
Kaspar Staub Digital Society Initiative, University of Zurich, Zurich, Switzerland Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland
Dominik Alois Ettlin Digital Society Initiative, University of Zurich, Zurich, Switzerland Center of Dental Medicine, University of Zurich, Zurich, Switzerland
Oliver Grübner Digital Society Initiative, University of Zurich, Zurich, Switzerland Department of Geography, University of Zurich, Zurich, Switzerland
Fabio Rinaldi Digital Society Initiative, University of Zurich, Zurich, Switzerland Dalle Molle Institute for Artificial Intelligence (IDSIA), Switzerland Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland Fondazione Bruno Kessler, Trento, Italy Swiss Institute of Bioinformatics, Switzerland
Viktor von Wyl Digital Society Initiative, University of Zurich, Zurich, Switzerland Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
for the University of Zurich Digital Society Initiative (UZH-DSI) Health Community

Collapse

Fitzpatrick NK, Dobson R, Roberts A, Jones K, Shah AD, Nenadic G, Ford E. Understanding stakeholder views around the creation of a consented donated databank of clinical free text to develop and train natural language processing models for research: an exploratory study (Preprint). JMIR Med Inform 2023;11:e45534. [PMID: 37133927 DOI: 10.2196/45534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 02/24/2023] [Accepted: 03/19/2023] [Indexed: 03/21/2023] Open

Abstract

BACKGROUND

Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose.

OBJECTIVE

This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community.

METHODS

Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers).

RESULTS

All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank.

CONCLUSIONS

These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery.

Collapse

Wu H, Wang M, Wu J, Francis F, Chang YH, Shavick A, Dong H, Poon MTC, Fitzpatrick N, Levine AP, Slater LT, Handy A, Karwath A, Gkoutos GV, Chelala C, Shah AD, Stewart R, Collier N, Alex B, Whiteley W, Sudlow C, Roberts A, Dobson RJB. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. NPJ Digit Med 2022;5:186. [PMID: 36544046 PMCID: PMC9770568 DOI: 10.1038/s41746-022-00730-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open

Affiliation(s)

Honghan Wu Institute of Health Informatics, University College London, London, UK.
Minhong Wang Institute of Health Informatics, University College London, London, UK
Jinge Wu Institute of Health Informatics, University College London, London, UK Usher Institute, University of Edinburgh, Edinburgh, UK
Farah Francis Usher Institute, University of Edinburgh, Edinburgh, UK
Yun-Hsuan Chang Institute of Health Informatics, University College London, London, UK
Alex Shavick Research Department of Pathology, UCL Cancer Institute, University College London, London, UK
Hang Dong Usher Institute, University of Edinburgh, Edinburgh, UK Department of Computer Science, University of Oxford, Oxford, UK
Michael T C Poon Usher Institute, University of Edinburgh, Edinburgh, UK
Natalie Fitzpatrick Institute of Health Informatics, University College London, London, UK
Adam P Levine Research Department of Pathology, UCL Cancer Institute, University College London, London, UK
Luke T Slater Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
Alex Handy Institute of Health Informatics, University College London, London, UK University College London Hospitals NHS Trust, London, UK
Andreas Karwath Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
Georgios V Gkoutos Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
Claude Chelala Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
Anoop Dinesh Shah Institute of Health Informatics, University College London, London, UK
Robert Stewart Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, London, UK South London and Maudsley NHS Foundation Trust, London, UK
Nigel Collier Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, University of Cambridge, Cambridge, UK
Beatrice Alex Edinburgh Futures Institute, University of Edinburgh, Edinburgh, UK
William Whiteley Usher Institute, University of Edinburgh, Edinburgh, UK
Cathie Sudlow Usher Institute, University of Edinburgh, Edinburgh, UK
Angus Roberts Department of Biostatistics & Health Informatics, King's College London, London, UK
Richard J B Dobson Institute of Health Informatics, University College London, London, UK Department of Biostatistics & Health Informatics, King's College London, London, UK

Collapse

Cumyn A, Ménard JF, Barton A, Dault R, Lévesque F, Ethier JF. Patients and Members of the Public’s Wishes Regarding Transparency in the Context of Secondary Use of Health Data: A Scoping Review (Preprint). J Med Internet Res 2022;25:e45002. [PMID: 37052967 PMCID: PMC10141314 DOI: 10.2196/45002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/09/2023] [Accepted: 03/03/2023] [Indexed: 03/06/2023] Open

Abstract

BACKGROUND

Secondary use of health data has reached unequaled potential to improve health systems governance, knowledge, and clinical care. Transparency regarding this secondary use is frequently cited as necessary to address deficits in trust and conditional support and to increase patient awareness.

OBJECTIVE

We aimed to review the current published literature to identify different stakeholders' perspectives and recommendations on what information patients and members of the public want to learn about the secondary use of health data for research purposes and how and in which situations.

METHODS

Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we conducted a scoping review using Medline, CINAHL, PsycINFO, Scopus, Cochrane Library, and PubMed databases to locate a broad range of studies published in English or French until November 2022. We included articles reporting a stakeholder's perspective or recommendations of what information patients and members of the public want to learn about the secondary use of health data for research purposes and how or in which situations. Data were collected and analyzed with an iterative thematic approach using NVivo.

RESULTS

Overall, 178 articles were included in this scoping review. The type of information can be divided into generic and specific content. Generic content includes information on governance and regulatory frameworks, technical aspects, and scientific aims. Specific content includes updates on the use of one's data, return of results from individual tests, information on global results, information on data sharing, and how to access one's data. Recommendations on how to communicate the information focused on frequency, use of various supports, formats, and wording. Methods for communication generally favored broad approaches such as nationwide publicity campaigns, mainstream and social media for generic content, and mixed approaches for specific content including websites, patient portals, and face-to-face encounters. Content should be tailored to the individual as much as possible with regard to length, avoidance of technical terms, cultural competence, and level of detail. Finally, the review outlined 4 major situations where communication was deemed necessary: before a new use of data, when new test results became available, when global research results were released, and in the advent of a breach in confidentiality.

CONCLUSIONS

This review highlights how different types of information and approaches to communication efforts may serve as the basis for achieving greater transparency. Governing bodies could use the results: to elaborate or evaluate strategies to educate on the potential benefits; to provide some knowledge and control over data use as a form of reciprocity; and as a condition to engage citizens and build and maintain trust. Future work is needed to assess which strategies achieve the greatest outreach while striking a balance between meeting information needs and use of resources.

Collapse

Mercorelli L, Nguyen H, Gartell N, Brookes M, Morris J, Tam CS. A framework for de-identification of free-text data in electronic medical records enabling secondary use. AUST HEALTH REV 2022;46:289-293. [PMID: 35546422 DOI: 10.1071/ah21361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/18/2022] [Indexed: 11/23/2022]

Ford E, Curlewis K, Squires E, Griffiths LJ, Stewart R, Jones KH. The Potential of Research Drawing on Clinical Free Text to Bring Benefits to Patients in the United Kingdom: A Systematic Review of the Literature. Front Digit Health 2021;3:606599. [PMID: 34713089 PMCID: PMC8521813 DOI: 10.3389/fdgth.2021.606599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 01/15/2021] [Indexed: 11/13/2022] Open

Abstract

Background: The analysis of clinical free text from patient records for research has potential to contribute to the medical evidence base but access to clinical free text is frequently denied by data custodians who perceive that the privacy risks of data-sharing are too high. Engagement activities with patients and regulators, where views on the sharing of clinical free text data for research have been discussed, have identified that stakeholders would like to understand the potential clinical benefits that could be achieved if access to free text for clinical research were improved. We aimed to systematically review all UK research studies which used clinical free text and report direct or potential benefits to patients, synthesizing possible benefits into an easy to communicate taxonomy for public engagement and policy discussions. Methods: We conducted a systematic search for articles which reported primary research using clinical free text, drawn from UK health record databases, which reported a benefit or potential benefit for patients, actionable in a clinical environment or health service, and not solely methods development or data quality improvement. We screened eligible papers and thematically analyzed information about clinical benefits reported in the paper to create a taxonomy of benefits. Results: We identified 43 papers and derived five themes of benefits: health-care quality or services improvement, observational risk factor-outcome research, drug prescribing safety, case-finding for clinical trials, and development of clinical decision support. Five papers compared study quality with and without free text and found an improvement of accuracy when free text was included in analytical models. Conclusions: Findings will help stakeholders weigh the potential benefits of free text research against perceived risks to patient privacy. The taxonomy can be used to aid public and policy discussions, and identified studies could form a public-facing repository which will help the health-care text analysis research community better communicate the impact of their work.

Collapse

Ford E, Sheppard J, Oliver S, Rooney P, Banerjee S, Cassell JA. Automated detection of patients with dementia whose symptoms have been identified in primary care but have no formal diagnosis: a retrospective case-control study using electronic primary care records. BMJ Open 2021;11:e039248. [PMID: 33483436 PMCID: PMC7831719 DOI: 10.1136/bmjopen-2020-039248] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Abstract

OBJECTIVES

UK statistics suggest only two-thirds of patients with dementia get a diagnosis recorded in primary care. General practitioners (GPs) report barriers to formally diagnosing dementia, so some patients may be known by GPs to have dementia but may be missing a diagnosis in their patient record. We aimed to produce a method to identify these 'known but unlabelled' patients with dementia using data from primary care patient records.

DESIGN

Retrospective case-control study using routinely collected primary care patient records from Clinical Practice Research Datalink.

SETTING

UK general practice.

PARTICIPANTS

English patients aged >65 years, with a coded diagnosis of dementia recorded in 2000-2012 (cases), matched 1:1 with patients with no diagnosis code for dementia (controls).

INTERVENTIONS

Eight coded and nine keyword concepts indicating symptoms, screening tests, referrals and care for dementia recorded in the 5 years before diagnosis. We trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, random forest).

PRIMARY AND SECONDARY OUTCOMES

The outcome variable was dementia diagnosis code; the accuracy of classifiers was assessed using area under the receiver operating characteristic curve (AUC); the order of features contributing to discrimination was examined.

RESULTS

93 426 patients were included; the median age was 83 years (64.8% women). Three classifiers achieved high discrimination and performed very similarly. AUCs were 0.87-0.90 with coded variables, rising to 0.90-0.94 with keywords added. Feature prioritisation was different for each classifier; commonly prioritised features were Alzheimer's prescription, dementia annual review, memory loss and dementia keywords.

CONCLUSIONS

It is possible to detect patients with dementia who are known to GPs but unlabelled with a diagnostic code, with a high degree of accuracy in electronic primary care record data. Using keywords from clinic notes and letters improves accuracy compared with coded data alone. This approach could improve identification of dementia cases for record-keeping, service planning and delivery of good quality care.

Collapse

Kirkham EJ, Crompton CJ, Iveson MH, Beange I, McIntosh AM, Fletcher-Watson S. Co-development of a Best Practice Checklist for Mental Health Data Science: A Delphi Study. Front Psychiatry 2021;12:643914. [PMID: 34177644 PMCID: PMC8222615 DOI: 10.3389/fpsyt.2021.643914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 05/14/2021] [Indexed: 12/20/2022] Open

Abstract

Background: Mental health research is commonly affected by difficulties in recruiting and retaining participants, resulting in findings which are based on a sub-sample of those actually living with mental illness. Increasing the use of Big Data for mental health research, especially routinely-collected data, could improve this situation. However, steps to facilitate this must be enacted in collaboration with those who would provide the data - people with mental health conditions. Methods: We used the Delphi method to create a best practice checklist for mental health data science. Twenty participants with both expertise in data science and personal experience of mental illness worked together over three phases. In Phase 1, participants rated a list of 63 statements and added any statements or topics that were missing. Statements receiving a mean score of 5 or more (out of 7) were retained. These were then combined with the results of a rapid thematic analysis of participants' comments to produce a 14-item draft checklist, with each item split into two components: best practice now and best practice in the future. In Phase 2, participants indicated whether or not each item should remain in the checklist, and items that scored more than 50% endorsement were retained. In Phase 3 participants rated their satisfaction with the final checklist. Results: The final checklist was made up of 14 "best practice" items, with each item covering best practice now and best practice in the future. At the end of the three phases, 85% of participants were (very) satisfied with the two best practice checklists, with no participants expressing dissatisfaction. Conclusions: Increased stakeholder involvement is essential at every stage of mental health data science. The checklist produced through this work represents the views of people with experience of mental illness, and it is hoped that it will be used to facilitate trustworthy and innovative research which is inclusive of a wider range of individuals.

Collapse