1
|
Tibble H, Sheikh A, Tsanas A. Development and validation of a machine learning risk prediction model for asthma attacks in adults in primary care. NPJ Prim Care Respir Med 2025; 35:24. [PMID: 40268974 PMCID: PMC12019439 DOI: 10.1038/s41533-025-00428-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Accepted: 04/07/2025] [Indexed: 04/25/2025] Open
Abstract
Primary care consultations provide an opportunity for patients and clinicians to assess asthma attack risk. Using a data-driven risk prediction tool with routinely collected health records may be an efficient way to aid promotion of effective self-management, and support clinical decision making. Longitudinal Scottish primary care data for 21,250 asthma patients were used to predict the risk of asthma attacks in the following year. A selection of machine learning algorithms (i.e., Naïve Bayes Classifier, Logistic Regression, Random Forests, and Extreme Gradient Boosting), hyperparameters, training data enrichment methods were explored, and validated in a random unseen data partition. Our final Logistic Regression model achieved the best performance when no training data enrichment was applied. Around 1 in 3 (36.2%) predicted high-risk patients had an attack within one year of consultation, compared to approximately 1 in 16 in the predicted low-risk group (6.7%). The model was well calibrated, with a calibration slope of 1.02 and an intercept of 0.004, and the Area under the Curve was 0.75. This model has the potential to increase the efficiency of routine asthma care by creating new personalized care pathways mapped to predicted risk of asthma attacks, such as priority ranking patients for scheduled consultations and interventions. Furthermore, it could be used to educate patients about their individual risk and risk factors, and promote healthier lifestyle changes, use of self-management plans, and early emergency care seeking following rapid symptom deterioration.
Collapse
Affiliation(s)
- Holly Tibble
- Usher Institute, The University of Edinburgh, Edinburgh, UK.
- Asthma UK Centre for Applied Research, Edinburgh, UK.
| | - Aziz Sheikh
- Usher Institute, The University of Edinburgh, Edinburgh, UK
- Asthma UK Centre for Applied Research, Edinburgh, UK
| | - Athanasios Tsanas
- Usher Institute, The University of Edinburgh, Edinburgh, UK
- Asthma UK Centre for Applied Research, Edinburgh, UK
| |
Collapse
|
2
|
Doe G, Wathall S, Clanchy J, Edwards S, Evans H, Steiner MC, Evans RA. Comparing research recruitment strategies to prospectively identify patients presenting with breathlessness in primary care. NPJ Prim Care Respir Med 2022; 32:49. [PMCID: PMC9646257 DOI: 10.1038/s41533-022-00308-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 10/05/2022] [Indexed: 11/11/2022] Open
Abstract
Two recruitment strategies for research were compared to prospectively identify patients with breathlessness who are awaiting a diagnosis in primary care. The first method utilised searches of the electronic patient record (EPR), the second method involved an electronic template triggered during a consultation. Using an electronic template triggered at the point of consultation increased recruitment to prospective research approximately nine-fold compared with searching for symptom codes and study mailouts.
Collapse
Affiliation(s)
- Gillian Doe
- grid.9918.90000 0004 1936 8411Department of Respiratory Science, University of Leicester, Leicester, UK
| | - Simon Wathall
- grid.9757.c0000 0004 0415 6205Clinical Trials Unit, Keele University, Newcastle-under-Lyme, UK
| | - Jill Clanchy
- grid.9918.90000 0004 1936 8411Clinical Trials Unit, University of Leicester, Leicester, UK
| | - Sarah Edwards
- grid.269014.80000 0001 0435 9078NIHR Biomedical Research Centre—Respiratory theme, University Hospitals of Leicester NHS Trust, Leicester, UK
| | - Helen Evans
- grid.269014.80000 0001 0435 9078NIHR Biomedical Research Centre—Respiratory theme, University Hospitals of Leicester NHS Trust, Leicester, UK
| | - Michael C. Steiner
- grid.9918.90000 0004 1936 8411Department of Respiratory Science, University of Leicester, Leicester, UK ,grid.269014.80000 0001 0435 9078NIHR Biomedical Research Centre—Respiratory theme, University Hospitals of Leicester NHS Trust, Leicester, UK
| | - Rachael A. Evans
- grid.9918.90000 0004 1936 8411Department of Respiratory Science, University of Leicester, Leicester, UK ,grid.269014.80000 0001 0435 9078NIHR Biomedical Research Centre—Respiratory theme, University Hospitals of Leicester NHS Trust, Leicester, UK
| |
Collapse
|
3
|
Jones KH, Ford EM, Lea N, Griffiths LJ, Hassan L, Heys S, Squires E, Nenadic G. Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper. J Med Internet Res 2020; 22:e16760. [PMID: 32597785 PMCID: PMC7367542 DOI: 10.2196/16760] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/06/2020] [Accepted: 03/23/2020] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Clinical free-text data (eg, outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be deidentified or anonymized before they can be reused for research, but there is a lack of established guidelines to govern effective deidentification and use of free-text information and avoid damaging data utility as a by-product. OBJECTIVE This study aimed to develop recommendations for the creation of data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient and public benefit. METHODS We outlined data protection legislation and regulations relating to the United Kingdom for context and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders, including text-mining researchers and the general public, to explore perceived barriers and solutions in working with clinical free-text. RESULTS We proposed a set of recommendations, including the need for authoritative guidance on data governance for the reuse of free-text data, to ensure public transparency in data flows and uses, to treat deidentified free-text data as potentially identifiable with use limited to accredited data safe havens, and to commit to a culture of continuous improvement to understand the relationships between the efficacy of deidentification and reidentification risks, so this can be communicated to all stakeholders. CONCLUSIONS By drawing together the findings of a combination of activities, we present a position paper to contribute to the development of data governance standards for the reuse of clinical free-text data for secondary purposes. While working in accordance with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit.
Collapse
Affiliation(s)
- Kerina H Jones
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | | | - Nathan Lea
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Lucy J Griffiths
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | - Lamiece Hassan
- Division of Informatics, Imaging & Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Sharon Heys
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | - Emma Squires
- Population Data Science, Medical School, Swansea University, Swansea, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, University of Manchester & The Alan Turing Institute, Manchester, United Kingdom
| |
Collapse
|
4
|
Thickett D, Voorham J, Ryan R, Jones R, Coker R, Wilson AM, Yang S, Ow MY, Raju P, Chaudhry I, Hardjojo A, Carter V, Price DB. Historical database cohort study addressing the clinical patterns prior to idiopathic pulmonary fibrosis (IPF) diagnosis in UK primary care. BMJ Open 2020; 10:e034428. [PMID: 32474425 PMCID: PMC7264834 DOI: 10.1136/bmjopen-2019-034428] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE To explore the clinical pathways, including signs and symptoms, and symptom progression patterns preceding idiopathic pulmonary fibrosis (IPF) diagnosis. DESIGN AND SETTING A historical cohort study was conducted using primary care patient records from the Optimum Patient Care Research Database. PARTICIPANTS Patients included were at least 30 years, had IPF diagnosis, identified via clinical-coding and free-text records and had a consultation with a chest specialist prior to IPF diagnosis. OUTCOME MEASURES The signs and symptoms in the year prior to IPF diagnosis from clinical codes and free-text in primary care electronic records included: cough, dyspnoea, dry cough, weight loss, fatigue/malaise, loss of appetite, crackles and clubbed fingers. The time course of presentations of clinical features and investigations in the years prior to IPF diagnosis were mapped. RESULTS Within 462 patients identified, the majority (77.9%) had a respiratory consultation within 365 days prior to the chest specialist visit preceding the IPF diagnosis recorded in their primary care records. The most common symptoms recorded in the 1 year prior to IPF diagnosis were dyspnoea (48.7%) and cough (40.9%); other signs and symptoms were rarely recorded (<5%). The majority of patients with cough (58.0%) and dyspnoea (55.0%) in the 1 year before IPF diagnosis had multiple recordings of the respective symptoms. Both cough and dyspnoea were recorded in 23.4% of patients in the year prior to diagnosis. Consultation rates for cough, dyspnoea and both, but not other signs or symptoms, began to increase 4 to 5 years prior diagnosis, with the sharpest increase in the last year. Cough and dyspnoea were often preceded by a reduction in measured weight over 5 years leading to IPF diagnosis. CONCLUSION Prolonged cough and/or progressive dyspnoea, especially if accompanied with weight loss, should signal for a referral to specialist assessment at the earliest opportunity.
Collapse
Affiliation(s)
- David Thickett
- Institute of Inflammation, University of Birmingham, Birmingham, UK
| | - Jaco Voorham
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
| | - Ronan Ryan
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
| | - Rupert Jones
- Peninsula Medical School, University of Plymouth, Plymouth, Devon, UK
| | | | - Andrew M Wilson
- Respiratory and Airways Group, Norwich Medical School, University of East Anglia, Norwich, UK
| | - Sen Yang
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
| | - Mandy Yl Ow
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
| | - Priyanka Raju
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
| | - Isha Chaudhry
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
| | - Antony Hardjojo
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
| | - Victoria Carter
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
| | - David B Price
- Observational and Pragmatic Research Institute Pte Ltd, Singapore
- Centre of Academic Primary Care, Division of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
5
|
Prognostic value of first-recorded breathlessness for future chronic respiratory and heart disease: a cohort study using a UK national primary care database. Br J Gen Pract 2020; 70:e264-e273. [PMID: 32041768 DOI: 10.3399/bjgp20x708221] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 06/21/2019] [Indexed: 10/31/2022] Open
Abstract
BACKGROUND Breathlessness is a common presentation in primary care. AIM To assess the long-term risk of diagnosed chronic obstructive pulmonary disease (COPD), asthma, ischaemic heart disease (IHD), and early mortality in patients with undiagnosed breathlessness. DESIGN AND SETTING Matched cohort study using data from the UK Clinical Practice Research Datalink. METHOD Adults with first-recorded breathlessness between 1997 and 2010 and no prior diagnostic or prescription record for IHD or a respiratory disease ('exposed' cohort) were matched to individuals with no record of breathlessness ('unexposed' cohort). Analyses were adjusted for sociodemographic and comorbidity characteristics. RESULTS In total, 75 698 patients (the exposed cohort) were followed for a median of 6.1 years, and more than one-third subsequently received a diagnosis of COPD, asthma, or IHD. In those who remained undiagnosed after 6 months, there were increased long-term risks of all three diagnoses compared with those in the unexposed cohort. Adjusted hazard ratios for COPD ranged from 8.6 (95% confidence interval [CI] = 6.8 to 11.0) for >6-12 months after the index date to 2.8 (95% CI = 2.6 to 3.0) for >36 months after the index date; asthma, 11.7 (CI = 9.4 to 14.6) to 4.3 (CI = 3.9 to 4.6); and IHD, 3.0 (CI = 2.7 to 3.4) to 1.6 (CI = 1.5 to 1.7). Risk of a longer time to diagnosis remained higher in members of the exposed cohort who had no relevant prescription in the first 6 months; approximately half of all future diagnoses were made for such patients. Risk of early mortality (all cause and disease specific) was higher in members of the exposed cohort. CONCLUSION Breathlessness can be an indicator of developing COPD, asthma, and IHD, and is associated with early mortality. With careful assessment, appropriate intervention, and proactive follow-up and monitoring, there is the potential to improve identification at first presentation in primary care in those at high risk of future disease who present with this symptom.
Collapse
|
6
|
Neville DM, Rupani H, Kalra PR, Adeniji K, Quint M, De Vos R, Begum S, Mottershaw M, Fogg C, Jones TL, Lanning E, Bassett P, Chauhan AJ. Exploring the Waveform Characteristics of Tidal Breathing Carbon Dioxide, Measured Using the N-Tidal C Device in Different Breathing Conditions (The General Breathing Record Study): Protocol for an Observational, Longitudinal Study. JMIR Res Protoc 2018; 7:e140. [PMID: 29798833 PMCID: PMC5992452 DOI: 10.2196/resprot.9767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 02/22/2018] [Accepted: 02/23/2018] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND In an increasingly comorbid population, there are significant challenges to diagnosing the cause of breathlessness, and once diagnosed, considerable difficulty in detecting deterioration early enough to provide effective intervention. The burden of the breathless patient on the health care economy is substantial, with asthma, chronic heart failure, and pneumonia affecting over 6 million people in the United Kingdom alone. Furthermore, these patients often have more than one contributory factor to their breathlessness symptoms, with conditions such as dysfunctional breathing pattern disorders-an under-recognized component. Current methods of diagnosing and monitoring breathless conditions can be extensive and difficult to perform. As a consequence, home monitoring is poorly complied with. In contrast, capnography (the measurement of tidal breath carbon dioxide) is performed during normal breathing. There is a need for a simple, easy-to-use, personal device that can aid in the diagnosis and monitoring of respiratory and cardiac causes of breathlessness. OBJECTIVE The aim of this study was to explore the use of a new, handheld capnometer (called the N-Tidal C) in different conditions that cause breathlessness. We will study whether the tidal breath carbon dioxide (TBCO2) waveform, as measured by the N-Tidal C, has different characteristics in a range of respiratory and cardiac conditions. METHODS We will perform a longitudinal, observational study of the TBCO2 waveform (capnogram) as measured by the N-Tidal C capnometer. Participants with a confirmed diagnosis of asthma, breathing pattern disorders, chronic heart failure, motor neurone disease, pneumonia, as well as volunteers with no history of lung disease will be asked to provide twice daily, 75-second TBCO2 collection via the N-Tidal C device for 6 months duration. The collated capnograms will be correlated with the underlying diagnosis and disease state (stable or exacerbation) to determine if there are different TBCO2 characteristics that can distinguish different respiratory and cardiac causes of breathlessness. RESULTS This study's recruitment is ongoing. It is anticipated that the results will be available in late 2018. CONCLUSIONS The General Breathing Record Study will provide an evaluation of the use of capnography as a diagnostic and home-monitoring tool for various diseases. REGISTERED REPORT IDENTIFIER RR1-10.2196/9767.
Collapse
Affiliation(s)
| | - Hitasha Rupani
- Portsmouth Hospitals NHS Trust, Portsmouth, United Kingdom
| | - Paul R Kalra
- Portsmouth Hospitals NHS Trust, Portsmouth, United Kingdom
| | - Kayode Adeniji
- Portsmouth Hospitals NHS Trust, Portsmouth, United Kingdom
| | - Matthew Quint
- Portsmouth Hospitals NHS Trust, Portsmouth, United Kingdom
| | - Ruth De Vos
- Portsmouth Hospitals NHS Trust, Portsmouth, United Kingdom
| | - Selina Begum
- Portsmouth Hospitals NHS Trust, Portsmouth, United Kingdom
| | | | - Carole Fogg
- Portsmouth Hospitals NHS Trust, Portsmouth, United Kingdom
| | - Thomas L Jones
- Portsmouth Hospitals NHS Trust, Portsmouth, United Kingdom
| | | | | | | |
Collapse
|
7
|
Taylor CJ, Hobbs FDR, Marshall T, Leyva-Leon F, Gale N. From breathless to failure: symptom onset and diagnostic meaning in patients with heart failure-a qualitative study. BMJ Open 2017; 7:e013648. [PMID: 28283487 PMCID: PMC5353318 DOI: 10.1136/bmjopen-2016-013648] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVES To explore 2 key points in the heart failure diagnostic pathway-symptom onset and diagnostic meaning-from the patient perspective. DESIGN Qualitative interview study. SETTING Participants were recruited from a secondary care clinic in central England following referral from primary care. PARTICIPANTS Over age 55 years with a recent (<1 year) diagnosis of heart failure confirmed by a cardiologist following initial presentation to primary care. METHODS Semistructured interviews were carried out with 16 participants (11 men and 5 women, median age 78.5 years) in their own homes. Data were audio-recorded and transcribed. Participants were asked to describe their diagnostic journey from when they first noticed something wrong up to and including the point of diagnosis. Data were analysed using the framework method. RESULTS Participants initially normalised symptoms and only sought medical help when daily activities were affected. Failure to realise that anything was wrong led to a delay in help-seeking. Participants' understanding of the term 'heart failure' was variable and 1 participant did not know he had the condition. The term itself caused great anxiety initially but participants learnt to cope with and accept their diagnosis over time. CONCLUSIONS Greater public awareness of symptoms and adequate explanation of 'heart failure' as a diagnostic label, or reconsideration of its use, are potential areas of service improvement.
Collapse
Affiliation(s)
- C J Taylor
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - F D R Hobbs
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - T Marshall
- Institute of Applied Health Research, University of Birmingham, Birmingham, UK
| | - F Leyva-Leon
- Aston Medical Research Insitutue, Aston Medical School, Birmingham, UK
| | - N Gale
- Health Services Management Centre, University of Birmingham, Birmingham, UK
| |
Collapse
|
8
|
Mukherjee M, Wyatt JC, Simpson CR, Sheikh A. Usage of allergy codes in primary care electronic health records: a national evaluation in Scotland. Allergy 2016; 71:1594-1602. [PMID: 27146325 DOI: 10.1111/all.12928] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/30/2016] [Indexed: 02/06/2023]
Abstract
BACKGROUND The UK's NHS intends to move from the current Read code system to the international, detailed Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) to facilitate more clinically appropriate coding of conditions and associated risk factors and outcomes. Given concerns about coding behaviour of general practitioners, we sought to study the current coding patterns in allergies and identify lessons for the future migration to SNOMED-CT. METHODS Data from 2 014 551 primary care consultations in over 100 000 patients with one or more of 11 potentially allergic diseases (anaphylaxis, angioedema, asthma, conjunctivitis, drug allergies, eczema, food allergy, rhinitis, urticaria, venom allergy and other probable allergic disorders) from the Scottish Primary Care Clinical Informatics Unit Research (PCCIU-R) database were descriptively analysed and visualized to understand Read code usage patterns. RESULTS We identified 352 Read codes for these allergic diseases, but only 36 codes (10%) were used in 95% of consultations; 73 codes (21%) were never used. Half of all usage was for Quality and Outcomes Framework codes for asthma. Despite 149 detailed codes (42%) being available for allergic triggers, these were infrequently used. CONCLUSIONS This analysis of Read codes use suggests that introduction of the more detailed SNOMED-CT, in isolation, will not improve the quality of allergy coding in Scottish primary care. The introduction of SNOMED-CT should be accompanied by initiatives aimed at improving coding quality, such as the definition of terms/codes, the availability of terminology browsers, a recommended list of codes and mechanisms to incentivize detailed coding of the condition and the underlying allergic trigger.
Collapse
Affiliation(s)
- M. Mukherjee
- Edinburgh Clinical Trials Unit (ECTU); The University of Edinburgh; Edinburgh UK
- Asthma UK Centre for Applied Research; Centre for Medical Informatics; Usher Institute of Population Health Sciences and Informatics; The University of Edinburgh; Edinburgh UK
| | - J. C. Wyatt
- Faculty of Medicine; Wessex Institute of Health & Research; University of Southampton; Southampton UK
| | - C. R. Simpson
- Asthma UK Centre for Applied Research; Centre for Medical Informatics; Usher Institute of Population Health Sciences and Informatics; The University of Edinburgh; Edinburgh UK
| | - A. Sheikh
- Asthma UK Centre for Applied Research; Centre for Medical Informatics; Usher Institute of Population Health Sciences and Informatics; The University of Edinburgh; Edinburgh UK
| |
Collapse
|
9
|
Price SJ, Stapley SA, Shephard E, Barraclough K, Hamilton WT. Is omission of free text records a possible source of data loss and bias in Clinical Practice Research Datalink studies? A case-control study. BMJ Open 2016; 6:e011664. [PMID: 27178981 PMCID: PMC4874123 DOI: 10.1136/bmjopen-2016-011664] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
OBJECTIVES To estimate data loss and bias in studies of Clinical Practice Research Datalink (CPRD) data that restrict analyses to Read codes, omitting anything recorded as text. DESIGN Matched case-control study. SETTING Patients contributing data to the CPRD. PARTICIPANTS 4915 bladder and 3635 pancreatic, cancer cases diagnosed between 1 January 2000 and 31 December 2009, matched on age, sex and general practitioner practice to up to 5 controls (bladder: n=21 718; pancreas: n=16 459). The analysis period was the year before cancer diagnosis. PRIMARY AND SECONDARY OUTCOME MEASURES Frequency of haematuria, jaundice and abdominal pain, grouped by recording style: Read code or text-only (ie, hidden text). The association between recording style and case-control status (χ(2) test). For each feature, the odds ratio (OR; conditional logistic regression) and positive predictive value (PPV; Bayes' theorem) for cancer, before and after addition of hidden text records. RESULTS Of the 20 958 total records of the features, 7951 (38%) were recorded in hidden text. Hidden text recording was more strongly associated with controls than with cases for haematuria (140/336=42% vs 556/3147=18%) in bladder cancer (χ(2) test, p<0.001), and for jaundice (21/31=67% vs 463/1565=30%, p<0.0001) and abdominal pain (323/1126=29% vs 397/1789=22%, p<0.001) in pancreatic cancer. Adding hidden text records corrected PPVs of haematuria for bladder cancer from 4.0% (95% CI 3.5% to 4.6%) to 2.9% (2.6% to 3.2%), and of jaundice for pancreatic cancer from 12.8% (7.3% to 21.6%) to 6.3% (4.5% to 8.7%). Adding hidden text records did not alter the PPV of abdominal pain for bladder (codes: 0.14%, 0.13% to 0.16% vs codes plus hidden text: 0.14%, 0.13% to 0.15%) or pancreatic (0.23%, 0.21% to 0.25% vs 0.21%, 0.20% to 0.22%) cancer. CONCLUSIONS Omission of text records from CPRD studies introduces bias that inflates outcome measures for recognised alarm symptoms. This potentially reinforces clinicians' views of the known importance of these symptoms, marginalising the significance of 'low-risk but not no-risk' symptoms.
Collapse
Affiliation(s)
- Sarah J Price
- Medical School, University of Exeter, College House, Exeter, UK
| | - Sal A Stapley
- Medical School, University of Exeter, College House, Exeter, UK
| | | | | | | |
Collapse
|