1
|
Sedlakova J, Daniore P, Horn Wintsch A, Wolf M, Stanikic M, Haag C, Sieber C, Schneider G, Staub K, Alois Ettlin D, Grübner O, Rinaldi F, von Wyl V. Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review. PLOS DIGITAL HEALTH 2023; 2:e0000347. [PMID: 37819910 PMCID: PMC10566734 DOI: 10.1371/journal.pdig.0000347] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 08/14/2023] [Indexed: 10/13/2023]
Abstract
Digital data play an increasingly important role in advancing health research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Unstructured data are often found in a format that lacks standardization and needs significant preprocessing and feature extraction efforts. This poses challenges when combining such data with other data sources to enhance the existing knowledge base, which we refer to as digital unstructured data enrichment. Overcoming these methodological challenges requires significant resources and may limit the ability to fully leverage their potential for advancing health research and, ultimately, prevention, and patient care delivery. While prevalent challenges associated with unstructured data use in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with structured data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health, along with possible solutions to address these challenges. Based on these findings, we developed a checklist that follows the standard data flow in health research studies. This checklist aims to provide initial systematic guidance to inform early planning and feasibility assessments for health research studies aiming combining unstructured data with existing data sources. Overall, the generality of reported unstructured data enrichment methods in the studies included in this review call for more systematic reporting of such methods to achieve greater reproducibility in future studies.
Collapse
Affiliation(s)
- Jana Sedlakova
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- Institute of Biomedical Ethics and History of Medicine, University of Zurich, Zurich, Switzerland
| | - Paola Daniore
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
| | - Andrea Horn Wintsch
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Center for Gerontology, University of Zurich, Zurich, Switzerland
- CoupleSense: Health and Interpersonal Emotion Regulation Group, University Research Priority Program (URPP) Dynamics of Healthy Aging, University of Zurich, Zurich, Switzerland
| | - Markus Wolf
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Psychology, University of Zurich, Zurich, Switzerland
| | - Mina Stanikic
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Christina Haag
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Chloé Sieber
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Gerold Schneider
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Kaspar Staub
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland
| | - Dominik Alois Ettlin
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Center of Dental Medicine, University of Zurich, Zurich, Switzerland
| | - Oliver Grübner
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Geography, University of Zurich, Zurich, Switzerland
| | - Fabio Rinaldi
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Dalle Molle Institute for Artificial Intelligence (IDSIA), Switzerland
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
- Fondazione Bruno Kessler, Trento, Italy
- Swiss Institute of Bioinformatics, Switzerland
| | - Viktor von Wyl
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland
| | | |
Collapse
|
2
|
Fitzpatrick NK, Dobson R, Roberts A, Jones K, Shah AD, Nenadic G, Ford E. Understanding stakeholder views around the creation of a consented donated databank of clinical free text to develop and train natural language processing models for research: an exploratory study (Preprint). JMIR Med Inform 2023; 11:e45534. [PMID: 37133927 DOI: 10.2196/45534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 02/24/2023] [Accepted: 03/19/2023] [Indexed: 03/21/2023] Open
Abstract
BACKGROUND Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose. OBJECTIVE This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community. METHODS Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers). RESULTS All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank. CONCLUSIONS These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery.
Collapse
Affiliation(s)
| | - Richard Dobson
- Department of Biostatistics and Health Informatics, King's College London, London, United Kingdom
| | - Angus Roberts
- Department of Biostatistics and Health Informatics, King's College London, London, United Kingdom
| | - Kerina Jones
- Department of Population Data Science, Swansea University Medical School, Swansea, United Kingdom
| | - Anoop D Shah
- Institute of Health Informatics, University College London, London, United Kingdom
- University College London Hospitals NHS Foundation Trust, London, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
| | - Elizabeth Ford
- Brighton and Sussex Medical School, Brighton, United Kingdom
| |
Collapse
|
3
|
Wu H, Wang M, Wu J, Francis F, Chang YH, Shavick A, Dong H, Poon MTC, Fitzpatrick N, Levine AP, Slater LT, Handy A, Karwath A, Gkoutos GV, Chelala C, Shah AD, Stewart R, Collier N, Alex B, Whiteley W, Sudlow C, Roberts A, Dobson RJB. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. NPJ Digit Med 2022; 5:186. [PMID: 36544046 PMCID: PMC9770568 DOI: 10.1038/s41746-022-00730-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
Much of the knowledge and information needed for enabling high-quality clinical research is stored in free-text format. Natural language processing (NLP) has been used to extract information from these sources at scale for several decades. This paper aims to present a comprehensive review of clinical NLP for the past 15 years in the UK to identify the community, depict its evolution, analyse methodologies and applications, and identify the main barriers. We collect a dataset of clinical NLP projects (n = 94; £ = 41.97 m) funded by UK funders or the European Union's funding programmes. Additionally, we extract details on 9 funders, 137 organisations, 139 persons and 431 research papers. Networks are created from timestamped data interlinking all entities, and network analysis is subsequently applied to generate insights. 431 publications are identified as part of a literature review, of which 107 are eligible for final analysis. Results show, not surprisingly, clinical NLP in the UK has increased substantially in the last 15 years: the total budget in the period of 2019-2022 was 80 times that of 2007-2010. However, the effort is required to deepen areas such as disease (sub-)phenotyping and broaden application domains. There is also a need to improve links between academia and industry and enable deployments in real-world settings for the realisation of clinical NLP's great potential in care delivery. The major barriers include research and development access to hospital data, lack of capable computational resources in the right places, the scarcity of labelled data and barriers to sharing of pretrained models.
Collapse
Affiliation(s)
- Honghan Wu
- Institute of Health Informatics, University College London, London, UK.
| | - Minhong Wang
- Institute of Health Informatics, University College London, London, UK
| | - Jinge Wu
- Institute of Health Informatics, University College London, London, UK
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Farah Francis
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Yun-Hsuan Chang
- Institute of Health Informatics, University College London, London, UK
| | - Alex Shavick
- Research Department of Pathology, UCL Cancer Institute, University College London, London, UK
| | - Hang Dong
- Usher Institute, University of Edinburgh, Edinburgh, UK
- Department of Computer Science, University of Oxford, Oxford, UK
| | | | | | - Adam P Levine
- Research Department of Pathology, UCL Cancer Institute, University College London, London, UK
| | - Luke T Slater
- Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
| | - Alex Handy
- Institute of Health Informatics, University College London, London, UK
- University College London Hospitals NHS Trust, London, UK
| | - Andreas Karwath
- Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
| | - Claude Chelala
- Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Anoop Dinesh Shah
- Institute of Health Informatics, University College London, London, UK
| | - Robert Stewart
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Nigel Collier
- Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, University of Cambridge, Cambridge, UK
| | - Beatrice Alex
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, UK
| | | | - Cathie Sudlow
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Angus Roberts
- Department of Biostatistics & Health Informatics, King's College London, London, UK
| | - Richard J B Dobson
- Institute of Health Informatics, University College London, London, UK
- Department of Biostatistics & Health Informatics, King's College London, London, UK
| |
Collapse
|
4
|
Cumyn A, Ménard JF, Barton A, Dault R, Lévesque F, Ethier JF. Patients and Members of the Public’s Wishes Regarding Transparency in the Context of Secondary Use of Health Data: A Scoping Review (Preprint). J Med Internet Res 2022; 25:e45002. [PMID: 37052967 PMCID: PMC10141314 DOI: 10.2196/45002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/09/2023] [Accepted: 03/03/2023] [Indexed: 03/06/2023] Open
Abstract
BACKGROUND Secondary use of health data has reached unequaled potential to improve health systems governance, knowledge, and clinical care. Transparency regarding this secondary use is frequently cited as necessary to address deficits in trust and conditional support and to increase patient awareness. OBJECTIVE We aimed to review the current published literature to identify different stakeholders' perspectives and recommendations on what information patients and members of the public want to learn about the secondary use of health data for research purposes and how and in which situations. METHODS Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we conducted a scoping review using Medline, CINAHL, PsycINFO, Scopus, Cochrane Library, and PubMed databases to locate a broad range of studies published in English or French until November 2022. We included articles reporting a stakeholder's perspective or recommendations of what information patients and members of the public want to learn about the secondary use of health data for research purposes and how or in which situations. Data were collected and analyzed with an iterative thematic approach using NVivo. RESULTS Overall, 178 articles were included in this scoping review. The type of information can be divided into generic and specific content. Generic content includes information on governance and regulatory frameworks, technical aspects, and scientific aims. Specific content includes updates on the use of one's data, return of results from individual tests, information on global results, information on data sharing, and how to access one's data. Recommendations on how to communicate the information focused on frequency, use of various supports, formats, and wording. Methods for communication generally favored broad approaches such as nationwide publicity campaigns, mainstream and social media for generic content, and mixed approaches for specific content including websites, patient portals, and face-to-face encounters. Content should be tailored to the individual as much as possible with regard to length, avoidance of technical terms, cultural competence, and level of detail. Finally, the review outlined 4 major situations where communication was deemed necessary: before a new use of data, when new test results became available, when global research results were released, and in the advent of a breach in confidentiality. CONCLUSIONS This review highlights how different types of information and approaches to communication efforts may serve as the basis for achieving greater transparency. Governing bodies could use the results: to elaborate or evaluate strategies to educate on the potential benefits; to provide some knowledge and control over data use as a form of reciprocity; and as a condition to engage citizens and build and maintain trust. Future work is needed to assess which strategies achieve the greatest outreach while striking a balance between meeting information needs and use of resources.
Collapse
Affiliation(s)
- Annabelle Cumyn
- Département de médecine, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
- Groupe de recherche interdisciplinaire en informatique de la santé, Faculté des sciences/Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Jean-Frédéric Ménard
- Groupe de recherche interdisciplinaire en informatique de la santé, Faculté des sciences/Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
- Faculté de droit, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Adrien Barton
- Groupe de recherche interdisciplinaire en informatique de la santé, Faculté des sciences/Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
- Institut de recherche en informatique de Toulouse, Toulouse, France
| | - Roxanne Dault
- Groupe de recherche interdisciplinaire en informatique de la santé, Faculté des sciences/Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Frédérique Lévesque
- Groupe de recherche interdisciplinaire en informatique de la santé, Faculté des sciences/Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Jean-François Ethier
- Département de médecine, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
- Groupe de recherche interdisciplinaire en informatique de la santé, Faculté des sciences/Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
| |
Collapse
|
5
|
Mercorelli L, Nguyen H, Gartell N, Brookes M, Morris J, Tam CS. A framework for de-identification of free-text data in electronic medical records enabling secondary use. AUST HEALTH REV 2022; 46:289-293. [PMID: 35546422 DOI: 10.1071/ah21361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/18/2022] [Indexed: 11/23/2022]
Abstract
Clinical free-text data represent a vast, untapped source of rich information. If more accessible for research it would supplement information captured in structured fields. Data need to be de-identified prior to being reused for research. However, a lack of transparency with existing de-identification software tools makes it difficult for data custodians to assess potential risks associated with the release of de-identified clinical free-text data. This case study describes the development of a framework for releasing de-identified clinical free-text data in two local health districts in NSW, Australia. A sample of clinical documents (n = 14 768 965), including progress notes, nursing and medical assessments and discharge summaries, were used for development. An algorithm was designed to identify and mask patient names without damaging data utility. For each note, the algorithm output the (i) note length before and after de-identification, (ii) the number of patient names and (iii) the number of common words. These outputs were used to iteratively refine the algorithm performance. This was followed by manual review of a random subset of records by a health information manager. Notes that were not correctly de-identified were fixed, and performance was reassessed until resolution. All notes in this sample were suitably de-identified using this method. Developing a transparent method for de-identifying clinical free-text data enables informed-decision making by data custodians and the safe re-use of clinical free-text data for research and public benefit.
Collapse
Affiliation(s)
- Louis Mercorelli
- Sydney Informatics Hub, University of Sydney, NSW, Australia; and Clinical Informatics Unit, Northern Sydney Local Health District, NSW, Australia
| | - Harrison Nguyen
- Performance and Analytics, Northern Sydney Local Health District, NSW, Australia; and Faculty of Medicine and Health, University of Sydney, Office 543, Level 5, School of Computer Science (J12), NSW 2006, Australia
| | - Nicole Gartell
- Health Information Services, Northern Sydney Local Health District, NSW, Australia
| | - Martyn Brookes
- Performance and Analytics, Northern Sydney Local Health District, NSW, Australia
| | | | - Charmaine S Tam
- Performance and Analytics, Northern Sydney Local Health District, NSW, Australia; and Faculty of Medicine and Health, University of Sydney, Office 543, Level 5, School of Computer Science (J12), NSW 2006, Australia
| |
Collapse
|
6
|
Ford E, Curlewis K, Squires E, Griffiths LJ, Stewart R, Jones KH. The Potential of Research Drawing on Clinical Free Text to Bring Benefits to Patients in the United Kingdom: A Systematic Review of the Literature. Front Digit Health 2021; 3:606599. [PMID: 34713089 PMCID: PMC8521813 DOI: 10.3389/fdgth.2021.606599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 01/15/2021] [Indexed: 11/13/2022] Open
Abstract
Background: The analysis of clinical free text from patient records for research has potential to contribute to the medical evidence base but access to clinical free text is frequently denied by data custodians who perceive that the privacy risks of data-sharing are too high. Engagement activities with patients and regulators, where views on the sharing of clinical free text data for research have been discussed, have identified that stakeholders would like to understand the potential clinical benefits that could be achieved if access to free text for clinical research were improved. We aimed to systematically review all UK research studies which used clinical free text and report direct or potential benefits to patients, synthesizing possible benefits into an easy to communicate taxonomy for public engagement and policy discussions. Methods: We conducted a systematic search for articles which reported primary research using clinical free text, drawn from UK health record databases, which reported a benefit or potential benefit for patients, actionable in a clinical environment or health service, and not solely methods development or data quality improvement. We screened eligible papers and thematically analyzed information about clinical benefits reported in the paper to create a taxonomy of benefits. Results: We identified 43 papers and derived five themes of benefits: health-care quality or services improvement, observational risk factor-outcome research, drug prescribing safety, case-finding for clinical trials, and development of clinical decision support. Five papers compared study quality with and without free text and found an improvement of accuracy when free text was included in analytical models. Conclusions: Findings will help stakeholders weigh the potential benefits of free text research against perceived risks to patient privacy. The taxonomy can be used to aid public and policy discussions, and identified studies could form a public-facing repository which will help the health-care text analysis research community better communicate the impact of their work.
Collapse
Affiliation(s)
- Elizabeth Ford
- Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, United Kingdom
| | - Keegan Curlewis
- Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, United Kingdom
| | - Emma Squires
- Swansea Medical School, University of Swansea, Swansea, United Kingdom
| | - Lucy J. Griffiths
- Swansea Medical School, University of Swansea, Swansea, United Kingdom
| | - Robert Stewart
- King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Kerina H. Jones
- Swansea Medical School, University of Swansea, Swansea, United Kingdom
| |
Collapse
|
7
|
Ford E, Sheppard J, Oliver S, Rooney P, Banerjee S, Cassell JA. Automated detection of patients with dementia whose symptoms have been identified in primary care but have no formal diagnosis: a retrospective case-control study using electronic primary care records. BMJ Open 2021; 11:e039248. [PMID: 33483436 PMCID: PMC7831719 DOI: 10.1136/bmjopen-2020-039248] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
OBJECTIVES UK statistics suggest only two-thirds of patients with dementia get a diagnosis recorded in primary care. General practitioners (GPs) report barriers to formally diagnosing dementia, so some patients may be known by GPs to have dementia but may be missing a diagnosis in their patient record. We aimed to produce a method to identify these 'known but unlabelled' patients with dementia using data from primary care patient records. DESIGN Retrospective case-control study using routinely collected primary care patient records from Clinical Practice Research Datalink. SETTING UK general practice. PARTICIPANTS English patients aged >65 years, with a coded diagnosis of dementia recorded in 2000-2012 (cases), matched 1:1 with patients with no diagnosis code for dementia (controls). INTERVENTIONS Eight coded and nine keyword concepts indicating symptoms, screening tests, referrals and care for dementia recorded in the 5 years before diagnosis. We trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, random forest). PRIMARY AND SECONDARY OUTCOMES The outcome variable was dementia diagnosis code; the accuracy of classifiers was assessed using area under the receiver operating characteristic curve (AUC); the order of features contributing to discrimination was examined. RESULTS 93 426 patients were included; the median age was 83 years (64.8% women). Three classifiers achieved high discrimination and performed very similarly. AUCs were 0.87-0.90 with coded variables, rising to 0.90-0.94 with keywords added. Feature prioritisation was different for each classifier; commonly prioritised features were Alzheimer's prescription, dementia annual review, memory loss and dementia keywords. CONCLUSIONS It is possible to detect patients with dementia who are known to GPs but unlabelled with a diagnostic code, with a high degree of accuracy in electronic primary care record data. Using keywords from clinic notes and letters improves accuracy compared with coded data alone. This approach could improve identification of dementia cases for record-keeping, service planning and delivery of good quality care.
Collapse
Affiliation(s)
- Elizabeth Ford
- Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, Brighton and Hove, UK
| | - Joanne Sheppard
- Department of Physics and Astronomy, University of Sussex School of Mathematical and Physical Sciences, Brighton, Brighton and Hove, UK
- Medical Physics and Biomedical Engineering, UCL, London, UK
| | - Seb Oliver
- Department of Physics and Astronomy, University of Sussex School of Mathematical and Physical Sciences, Brighton, Brighton and Hove, UK
| | - Philip Rooney
- Department of Physics and Astronomy, University of Sussex School of Mathematical and Physical Sciences, Brighton, Brighton and Hove, UK
| | - Sube Banerjee
- Faculty of Health, University of Plymouth, Plymouth, Devon, UK
| | - Jackie A Cassell
- Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, Brighton and Hove, UK
| |
Collapse
|
8
|
Kirkham EJ, Crompton CJ, Iveson MH, Beange I, McIntosh AM, Fletcher-Watson S. Co-development of a Best Practice Checklist for Mental Health Data Science: A Delphi Study. Front Psychiatry 2021; 12:643914. [PMID: 34177644 PMCID: PMC8222615 DOI: 10.3389/fpsyt.2021.643914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 05/14/2021] [Indexed: 12/20/2022] Open
Abstract
Background: Mental health research is commonly affected by difficulties in recruiting and retaining participants, resulting in findings which are based on a sub-sample of those actually living with mental illness. Increasing the use of Big Data for mental health research, especially routinely-collected data, could improve this situation. However, steps to facilitate this must be enacted in collaboration with those who would provide the data - people with mental health conditions. Methods: We used the Delphi method to create a best practice checklist for mental health data science. Twenty participants with both expertise in data science and personal experience of mental illness worked together over three phases. In Phase 1, participants rated a list of 63 statements and added any statements or topics that were missing. Statements receiving a mean score of 5 or more (out of 7) were retained. These were then combined with the results of a rapid thematic analysis of participants' comments to produce a 14-item draft checklist, with each item split into two components: best practice now and best practice in the future. In Phase 2, participants indicated whether or not each item should remain in the checklist, and items that scored more than 50% endorsement were retained. In Phase 3 participants rated their satisfaction with the final checklist. Results: The final checklist was made up of 14 "best practice" items, with each item covering best practice now and best practice in the future. At the end of the three phases, 85% of participants were (very) satisfied with the two best practice checklists, with no participants expressing dissatisfaction. Conclusions: Increased stakeholder involvement is essential at every stage of mental health data science. The checklist produced through this work represents the views of people with experience of mental illness, and it is hoped that it will be used to facilitate trustworthy and innovative research which is inclusive of a wider range of individuals.
Collapse
Affiliation(s)
- Elizabeth J Kirkham
- Division of Psychiatry, Centre for Clinical Brain Sciences, Kennedy Tower, Royal Edinburgh Hospital, University of Edinburgh, Edinburgh, United Kingdom
| | - Catherine J Crompton
- Division of Psychiatry, Centre for Clinical Brain Sciences, Kennedy Tower, Royal Edinburgh Hospital, University of Edinburgh, Edinburgh, United Kingdom
| | - Matthew H Iveson
- Division of Psychiatry, Centre for Clinical Brain Sciences, Kennedy Tower, Royal Edinburgh Hospital, University of Edinburgh, Edinburgh, United Kingdom
| | - Iona Beange
- Division of Psychiatry, Centre for Clinical Brain Sciences, Kennedy Tower, Royal Edinburgh Hospital, University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew M McIntosh
- Division of Psychiatry, Centre for Clinical Brain Sciences, Kennedy Tower, Royal Edinburgh Hospital, University of Edinburgh, Edinburgh, United Kingdom
| | - Sue Fletcher-Watson
- Division of Psychiatry, Centre for Clinical Brain Sciences, Kennedy Tower, Royal Edinburgh Hospital, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|