1
|
Artificial intelligence-assisted automated heart failure detection and classification from electronic health records. ESC Heart Fail 2024. [PMID: 38700133 DOI: 10.1002/ehf2.14828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 04/08/2024] [Indexed: 05/05/2024] Open
Abstract
AIMS Electronic health records (EHR) linked to Digital Imaging and Communications in Medicine (DICOM), biological specimens, and deep learning (DL) algorithms could potentially improve patient care through automated case detection and surveillance. We hypothesized that by applying keyword searches to routinely stored EHR, in conjunction with AI-powered automated reading of DICOM echocardiography images and analysing biomarkers from routinely stored plasma samples, we were able to identify heart failure (HF) patients. METHODS AND RESULTS We used EHR data between 1993 and 2021 from Tayside and Fife (~20% of the Scottish population). We implemented a keyword search strategy complemented by filtering based on International Classification of Diseases (ICD) codes and prescription data to EHR data set. We then applied DL for the automated interpretation of echocardiographic DICOM images. These methods were then integrated with the analysis of routinely stored plasma samples to identify and categorize patients into HF with reduced ejection fraction (HFrEF), HF with preserved ejection fraction (HFpEF), and controls without HF. The final diagnosis was verified through a manual review of medical records, measured natriuretic peptides in stored blood samples, and by comparing clinical outcomes among groups. In our study, we selected the patient cohort through an algorithmic workflow. This process started with 60 850 EHR data and resulted in a final cohort of 578 patients, divided into 186 controls, 236 with HFpEF, and 156 with HFrEF, after excluding individuals with mismatched data or significant valvular heart disease. The analysis of baseline characteristics revealed that compared with controls, patients with HFrEF and HFpEF were generally older, had higher BMI, and showed a greater prevalence of co-morbidities such as diabetes, COPD, and CKD. Echocardiographic analysis, enhanced by DL, provided high coverage, and detailed insights into cardiac function, showing significant differences in parameters such as left ventricular diameter, ejection fraction, and myocardial strain among the groups. Clinical outcomes highlighted a higher risk of hospitalization and mortality for HF patients compared with controls, with particularly elevated risk ratios for both HFrEF and HFpEF groups. The concordance between the algorithmic selection of patients and manual validation demonstrated high accuracy, supporting the effectiveness of our approach in identifying and classifying HF subtypes, which could significantly impact future HF diagnosis and management strategies. CONCLUSIONS Our study highlights the feasibility of combining keyword searches in EHR, DL automated echocardiographic interpretation, and biobank resources to identify HF subtypes.
Collapse
|
2
|
Association of adrenal steroids with metabolomic profiles in patients with primary and endocrine hypertension. Front Endocrinol (Lausanne) 2024; 15:1370525. [PMID: 38596218 PMCID: PMC11002274 DOI: 10.3389/fendo.2024.1370525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 03/05/2024] [Indexed: 04/11/2024] Open
Abstract
Introduction Endocrine hypertension (EHT) due to pheochromocytoma/paraganglioma (PPGL), Cushing's syndrome (CS), or primary aldosteronism (PA) is linked to a variety of metabolic alterations and comorbidities. Accordingly, patients with EHT and primary hypertension (PHT) are characterized by distinct metabolic profiles. However, it remains unclear whether the metabolomic differences relate solely to the disease-defining hormonal parameters. Therefore, our objective was to study the association of disease defining hormonal excess and concomitant adrenal steroids with metabolomic alterations in patients with EHT. Methods Retrospective European multicenter study of 263 patients (mean age 49 years, 50% females; 58 PHT, 69 PPGL, 37 CS, 99 PA) in whom targeted metabolomic and adrenal steroid profiling was available. The association of 13 adrenal steroids with differences in 79 metabolites between PPGL, CS, PA and PHT was examined after correction for age, sex, BMI, and presence of diabetes mellitus. Results After adjustment for BMI and diabetes mellitus significant association between adrenal steroids and metabolites - 18 in PPGL, 15 in CS, and 23 in PA - were revealed. In PPGL, the majority of metabolite associations were linked to catecholamine excess, whereas in PA, only one metabolite was associated with aldosterone. In contrast, cortisone (16 metabolites), cortisol (6 metabolites), and DHEA (8 metabolites) had the highest number of associated metabolites in PA. In CS, 18-hydroxycortisol significantly influenced 5 metabolites, cortisol affected 4, and cortisone, 11-deoxycortisol, and DHEA each were linked to 3 metabolites. Discussions Our study indicates cortisol, cortisone, and catecholamine excess are significantly associated with metabolomic variances in EHT versus PHT patients. Notably, catecholamine excess is key to PPGL's metabolomic changes, whereas in PA, other non-defining adrenal steroids mainly account for metabolomic differences. In CS, cortisol, alongside other non-defining adrenal hormones, contributes to these differences, suggesting that metabolic disorders and cardiovascular morbidity in these conditions could also be affected by various adrenal steroids.
Collapse
|
3
|
The Scottish Medical Imaging Archive: 57.3 Million Radiology Studies Linked to Their Medical Records. Radiol Artif Intell 2024; 6:e220266. [PMID: 38166330 PMCID: PMC10831519 DOI: 10.1148/ryai.220266] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 07/20/2023] [Accepted: 09/11/2023] [Indexed: 01/04/2024]
Abstract
Keywords: MRI, Imaging Sequences, Ultrasound, Mammography, CT, Angiography, Conventional Radiography Published under a CC BY 4.0 license. See also the commentary by Whitman and Vining in this issue.
Collapse
|
4
|
Machine learning models in trusted research environments - understanding operational risks. Int J Popul Data Sci 2023; 8:2165. [PMID: 38414545 PMCID: PMC10898318 DOI: 10.23889/ijpds.v8i1.2165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024] Open
Abstract
Introduction Trusted research environments (TREs) provide secure access to very sensitive data for research. All TREs operate manual checks on outputs to ensure there is no residual disclosure risk. Machine learning (ML) models require very large amount of data; if this data is personal, the TRE is a well-established data management solution. However, ML models present novel disclosure risks, in both type and scale. Objectives As part of a series on ML disclosure risk in TREs, this article is intended to introduce TRE managers to the conceptual problems and work being done to address them. Methods We demonstrate how ML models present a qualitatively different type of disclosure risk, compared to traditional statistical outputs. These arise from both the nature and the scale of ML modelling. Results We show that there are a large number of unresolved issues, although there is progress in many areas. We show where areas of uncertainty remain, as well as remedial responses available to TREs. Conclusions At this stage, disclosure checking of ML models is very much a specialist activity. However, TRE managers need a basic awareness of the potential risk in ML models to enable them to make sensible decisions on using TREs for ML model development.
Collapse
|
5
|
Impact of data source choice on multimorbidity measurement: a comparison study of 2.3 million individuals in the Welsh National Health Service. BMC Med 2023; 21:309. [PMID: 37582755 PMCID: PMC10426056 DOI: 10.1186/s12916-023-02970-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/03/2023] [Indexed: 08/17/2023] Open
Abstract
BACKGROUND Measurement of multimorbidity in research is variable, including the choice of the data source used to ascertain conditions. We compared the estimated prevalence of multimorbidity and associations with mortality using different data sources. METHODS A cross-sectional study of SAIL Databank data including 2,340,027 individuals of all ages living in Wales on 01 January 2019. Comparison of prevalence of multimorbidity and constituent 47 conditions using data from primary care (PC), hospital inpatient (HI), and linked PC-HI data sources and examination of associations between condition count and 12-month mortality. RESULTS Using linked PC-HI compared with only HI data, multimorbidity was more prevalent (32.2% versus 16.5%), and the population of people identified as having multimorbidity was younger (mean age 62.5 versus 66.8 years) and included more women (54.2% versus 52.6%). Individuals with multimorbidity in both PC and HI data had stronger associations with mortality than those with multimorbidity only in HI data (adjusted odds ratio 8.34 [95% CI 8.02-8.68] versus 6.95 (95%CI 6.79-7.12] in people with ≥ 4 conditions). The prevalence of conditions identified using only PC versus only HI data was significantly higher for 37/47 and significantly lower for 10/47: the highest PC/HI ratio was for depression (14.2 [95% CI 14.1-14.4]) and the lowest for aneurysm (0.51 [95% CI 0.5-0.5]). Agreement in ascertainment of conditions between the two data sources varied considerably, being slight for five (kappa < 0.20), fair for 12 (kappa 0.21-0.40), moderate for 16 (kappa 0.41-0.60), and substantial for 12 (kappa 0.61-0.80) conditions, and by body system was lowest for mental and behavioural disorders. The percentage agreement, individuals with a condition identified in both PC and HI data, was lowest in anxiety (4.6%) and highest in coronary artery disease (62.9%). CONCLUSIONS The use of single data sources may underestimate prevalence when measuring multimorbidity and many important conditions (especially mental and behavioural disorders). Caution should be used when interpreting findings of research examining individual and multiple long-term conditions using single data sources. Where available, researchers using electronic health data should link primary care and hospital inpatient data to generate more robust evidence to support evidence-based healthcare planning decisions for people with multimorbidity.
Collapse
|
6
|
The impact of varying the number and selection of conditions on estimated multimorbidity prevalence: A cross-sectional study using a large, primary care population dataset. PLoS Med 2023; 20:e1004208. [PMID: 37014910 PMCID: PMC10072475 DOI: 10.1371/journal.pmed.1004208] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 02/27/2023] [Indexed: 04/05/2023] Open
Abstract
BACKGROUND Multimorbidity prevalence rates vary considerably depending on the conditions considered in the morbidity count, but there is no standardised approach to the number or selection of conditions to include. METHODS AND FINDINGS We conducted a cross-sectional study using English primary care data for 1,168,260 participants who were all people alive and permanently registered with 149 included general practices. Outcome measures of the study were prevalence estimates of multimorbidity (defined as ≥2 conditions) when varying the number and selection of conditions considered for 80 conditions. Included conditions featured in ≥1 of the 9 published lists of conditions examined in the study and/or phenotyping algorithms in the Health Data Research UK (HDR-UK) Phenotype Library. First, multimorbidity prevalence was calculated when considering the individually most common 2 conditions, 3 conditions, etc., up to 80 conditions. Second, prevalence was calculated using 9 condition-lists from published studies. Analyses were stratified by dependent variables age, socioeconomic position, and sex. Prevalence when only the 2 commonest conditions were considered was 4.6% (95% CI [4.6, 4.6] p < 0.001), rising to 29.5% (95% CI [29.5, 29.6] p < 0.001) considering the 10 commonest, 35.2% (95% CI [35.1, 35.3] p < 0.001) considering the 20 commonest, and 40.5% (95% CI [40.4, 40.6] p < 0.001) when considering all 80 conditions. The threshold number of conditions at which multimorbidity prevalence was >99% of that measured when considering all 80 conditions was 52 for the whole population but was lower in older people (29 in >80 years) and higher in younger people (71 in 0- to 9-year-olds). Nine published condition-lists were examined; these were either recommended for measuring multimorbidity, used in previous highly cited studies of multimorbidity prevalence, or widely applied measures of "comorbidity." Multimorbidity prevalence using these lists varied from 11.1% to 36.4%. A limitation of the study is that conditions were not always replicated using the same ascertainment rules as previous studies to improve comparability across condition-lists, but this highlights further variability in prevalence estimates across studies. CONCLUSIONS In this study, we observed that varying the number and selection of conditions results in very large differences in multimorbidity prevalence, and different numbers of conditions are needed to reach ceiling rates of multimorbidity prevalence in certain groups of people. These findings imply that there is a need for a standardised approach to defining multimorbidity, and to facilitate this, researchers can use existing condition-lists associated with highest multimorbidity prevalence.
Collapse
|
7
|
Disclosure control of machine learning models from trusted research environments (TRE): New challenges and opportunities. Heliyon 2023; 9:e15143. [PMID: 37123891 PMCID: PMC10130764 DOI: 10.1016/j.heliyon.2023.e15143] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/23/2023] [Accepted: 03/28/2023] [Indexed: 04/05/2023] Open
Abstract
Introduction Artificial intelligence (AI) applications in healthcare and medicine have increased in recent years. To enable access to personal data, Trusted Research Environments (TREs) (otherwise known as Safe Havens) provide safe and secure environments in which researchers can access sensitive personal data and develop AI (in particular machine learning (ML)) models. However, currently few TREs support the training of ML models in part due to a gap in the practical decision-making guidance for TREs in handling model disclosure. Specifically, the training of ML models creates a need to disclose new types of outputs from TREs. Although TREs have clear policies for the disclosure of statistical outputs, the extent to which trained models can leak personal training data once released is not well understood. Background We review, for a general audience, different types of ML models and their applicability within healthcare. We explain the outputs from training a ML model and how trained ML models can be vulnerable to external attacks to discover personal data encoded within the model. Risks We present the challenges for disclosure control of trained ML models in the context of training and exporting models from TREs. We provide insights and analyse methods that could be introduced within TREs to mitigate the risk of privacy breaches when disclosing trained models. Discussion Although specific guidelines and policies exist for statistical disclosure controls in TREs, they do not satisfactorily address these new types of output requests; i.e., trained ML models. There is significant potential for new interdisciplinary research opportunities in developing and adapting policies and tools for safely disclosing ML outputs from TREs.
Collapse
|
8
|
Age, sex, and socioeconomic differences in multimorbidity measured in four ways: UK primary care cross-sectional analysis. Br J Gen Pract 2023; 73:e249-e256. [PMID: 36997222 PMCID: PMC9923763 DOI: 10.3399/bjgp.2022.0405] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/03/2022] [Indexed: 10/31/2022] Open
Abstract
BACKGROUND Multimorbidity poses major challenges to healthcare systems worldwide. Definitions with cut-offs in excess of ≥2 long-term conditions (LTCs) might better capture populations with complexity but are not standardised. AIM To examine variation in prevalence using different definitions of multimorbidity. DESIGN AND SETTING Cross-sectional study of 1 168 620 people in England. METHOD Comparison of multimorbidity (MM) prevalence using four definitions: MM2+ (≥2 LTCs), MM3+ (≥3 LTCs), MM3+ from 3+ (≥3 LTCs from ≥3 International Classification of Diseases, 10th revision chapters), and mental-physical MM (≥2 LTCs where ≥1 mental health LTC and ≥1 physical health LTC are recorded). Logistic regression was used to examine patient characteristics associated with multimorbidity under all four definitions. RESULTS MM2+ was most common (40.4%) followed by MM3+ (27.5%), MM3+ from 3+ (22.6%), and mental-physical MM (18.9%). MM2+, MM3+, and MM3+ from 3+ were strongly associated with oldest age (adjusted odds ratio [aOR] 58.09, 95% confidence interval [CI] = 56.13 to 60.14; aOR 77.69, 95% CI = 75.33 to 80.12; and aOR 102.06, 95% CI = 98.61 to 105.65; respectively), but mental-physical MM was much less strongly associated (aOR 4.32, 95% CI = 4.21 to 4.43). People in the most deprived decile had equivalent rates of multimorbidity at a younger age than those in the least deprived decile. This was most marked in mental-physical MM at 40-45 years younger, followed by MM2+ at 15-20 years younger, and MM3+ and MM3+ from 3+ at 10-15 years younger. Females had higher prevalence of multimorbidity under all definitions, which was most marked for mental-physical MM. CONCLUSION Estimated prevalence of multimorbidity depends on the definition used, and associations with age, sex, and socioeconomic position vary between definitions. Applicable multimorbidity research requires consistency of definitions across studies.
Collapse
|
9
|
A Hybrid Architecture (CO-CONNECT) to Facilitate Rapid Discovery and Access to Data Across the United Kingdom in Response to the COVID-19 Pandemic: Development Study. J Med Internet Res 2022; 24:e40035. [PMID: 36322788 PMCID: PMC9822177 DOI: 10.2196/40035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/12/2022] [Accepted: 11/01/2022] [Indexed: 11/05/2022] Open
Abstract
BACKGROUND COVID-19 data have been generated across the United Kingdom as a by-product of clinical care and public health provision, as well as numerous bespoke and repurposed research endeavors. Analysis of these data has underpinned the United Kingdom's response to the pandemic, and informed public health policies and clinical guidelines. However, these data are held by different organizations, and this fragmented landscape has presented challenges for public health agencies and researchers as they struggle to find relevant data to access and interrogate the data they need to inform the pandemic response at pace. OBJECTIVE We aimed to transform UK COVID-19 diagnostic data sets to be findable, accessible, interoperable, and reusable (FAIR). METHODS A federated infrastructure model (COVID - Curated and Open Analysis and Research Platform [CO-CONNECT]) was rapidly built to enable the automated and reproducible mapping of health data partners' pseudonymized data to the Observational Medical Outcomes Partnership Common Data Model without the need for any data to leave the data controllers' secure environments, and to support federated cohort discovery queries and meta-analysis. RESULTS A total of 56 data sets from 19 organizations are being connected to the federated network. The data include research cohorts and COVID-19 data collected through routine health care provision linked to longitudinal health care records and demographics. The infrastructure is live, supporting aggregate-level querying of data across the United Kingdom. CONCLUSIONS CO-CONNECT was developed by a multidisciplinary team. It enables rapid COVID-19 data discovery and instantaneous meta-analysis across data sources, and it is researching streamlined data extraction for use in a Trusted Research Environment for research and public health analysis. CO-CONNECT has the potential to make UK health data more interconnected and better able to answer national-level research questions while maintaining patient confidentiality and local governance procedures.
Collapse
|
10
|
Whole blood methylome-derived features to discriminate endocrine hypertension. Clin Epigenetics 2022; 14:142. [PMCID: PMC9635165 DOI: 10.1186/s13148-022-01347-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 09/18/2022] [Indexed: 11/06/2022] Open
Abstract
Background Arterial hypertension represents a worldwide health burden and a major risk factor for cardiovascular morbidity and mortality. Hypertension can be primary (primary hypertension, PHT), or secondary to endocrine disorders (endocrine hypertension, EHT), such as Cushing's syndrome (CS), primary aldosteronism (PA), and pheochromocytoma/paraganglioma (PPGL). Diagnosis of EHT is currently based on hormone assays. Efficient detection remains challenging, but is crucial to properly orientate patients for diagnostic confirmation and specific treatment. More accurate biomarkers would help in the diagnostic pathway. We hypothesized that each type of endocrine hypertension could be associated with a specific blood DNA methylation signature, which could be used for disease discrimination. To identify such markers, we aimed at exploring the methylome profiles in a cohort of 255 patients with hypertension, either PHT (n = 42) or EHT (n = 213), and at identifying specific discriminating signatures using machine learning approaches. Results Unsupervised classification of samples showed discrimination of PHT from EHT. CS patients clustered separately from all other patients, whereas PA and PPGL showed an overall overlap. Global methylation was decreased in the CS group compared to PHT. Supervised comparison with PHT identified differentially methylated CpG sites for each type of endocrine hypertension, showing a diffuse genomic location. Among the most differentially methylated genes, FKBP5 was identified in the CS group. Using four different machine learning methods—Lasso (Least Absolute Shrinkage and Selection Operator), Logistic Regression, Random Forest, and Support Vector Machine—predictive models for each type of endocrine hypertension were built on training cohorts (80% of samples for each hypertension type) and estimated on validation cohorts (20% of samples for each hypertension type). Balanced accuracies ranged from 0.55 to 0.74 for predicting EHT, 0.85 to 0.95 for predicting CS, 0.66 to 0.88 for predicting PA, and 0.70 to 0.83 for predicting PPGL. Conclusions The blood DNA methylome can discriminate endocrine hypertension, with methylation signatures for each type of endocrine disorder. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-022-01347-y.
Collapse
|
11
|
Machine learning for classification of hypertension subtypes using multi-omics: A multi-centre, retrospective, data-driven study. EBioMedicine 2022; 84:104276. [PMID: 36179553 PMCID: PMC9520210 DOI: 10.1016/j.ebiom.2022.104276] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/31/2022] [Accepted: 09/06/2022] [Indexed: 11/09/2022] Open
Abstract
Background Arterial hypertension is a major cardiovascular risk factor. Identification of secondary hypertension in its various forms is key to preventing and targeting treatment of cardiovascular complications. Simplified diagnostic tests are urgently required to distinguish primary and secondary hypertension to address the current underdiagnosis of the latter. Methods This study uses Machine Learning (ML) to classify subtypes of endocrine hypertension (EHT) in a large cohort of hypertensive patients using multidimensional omics analysis of plasma and urine samples. We measured 409 multi-omics (MOmics) features including plasma miRNAs (PmiRNA: 173), plasma catechol O-methylated metabolites (PMetas: 4), plasma steroids (PSteroids: 16), urinary steroid metabolites (USteroids: 27), and plasma small metabolites (PSmallMB: 189) in primary hypertension (PHT) patients, EHT patients with either primary aldosteronism (PA), pheochromocytoma/functional paraganglioma (PPGL) or Cushing syndrome (CS) and normotensive volunteers (NV). Biomarker discovery involved selection of disease combination, outlier handling, feature reduction, 8 ML classifiers, class balancing and consideration of different age- and sex-based scenarios. Classifications were evaluated using balanced accuracy, sensitivity, specificity, AUC, F1, and Kappa score. Findings Complete clinical and biological datasets were generated from 307 subjects (PA=113, PPGL=88, CS=41 and PHT=112). The random forest classifier provided ∼92% balanced accuracy (∼11% improvement on the best mono-omics classifier), with 96% specificity and 0.95 AUC to distinguish one of the four conditions in multi-class ALL-ALL comparisons (PPGL vs PA vs CS vs PHT) on an unseen test set, using 57 MOmics features. For discrimination of EHT (PA + PPGL + CS) vs PHT, the simple logistic classifier achieved 0.96 AUC with 90% sensitivity, and ∼86% specificity, using 37 MOmics features. One PmiRNA (hsa-miR-15a-5p) and two PSmallMB (C9 and PC ae C38:1) features were found to be most discriminating for all disease combinations. Overall, the MOmics-based classifiers were able to provide better classification performance in comparison to mono-omics classifiers. Interpretation We have developed a ML pipeline to distinguish different EHT subtypes from PHT using multi-omics data. This innovative approach to stratification is an advancement towards the development of a diagnostic tool for EHT patients, significantly increasing testing throughput and accelerating administration of appropriate treatment. Funding European Union's Horizon 2020 Research and Innovation Programme under Grant Agreement No. 633983, Clinical Research Priority Program of the University of Zurich for the CRPP HYRENE (to Z.E. and F.B.), and Deutsche Forschungsgemeinschaft (CRC/Transregio 205/1).
Collapse
|
12
|
Next-Generation Capabilities in Trusted Research Environments: Interview Study. J Med Internet Res 2022; 24:e33720. [PMID: 36125859 PMCID: PMC9533202 DOI: 10.2196/33720] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 03/22/2022] [Accepted: 05/30/2022] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND A Trusted Research Environment (TRE; also known as a Safe Haven) is an environment supported by trained staff and agreed processes (principles and standards), providing access to data for research while protecting patient confidentiality. Accessing sensitive data without compromising the privacy and security of the data is a complex process. OBJECTIVE This paper presents the security measures, administrative procedures, and technical approaches adopted by TREs. METHODS We contacted 73 TRE operators, 22 (30%) of whom, in the United Kingdom and internationally, agreed to be interviewed remotely under a nondisclosure agreement and to complete a questionnaire about their TRE. RESULTS We observed many similar processes and standards that TREs follow to adhere to the Seven Safes principles. The security processes and TRE capabilities for supporting observational studies using classical statistical methods were mature, and the requirements were well understood. However, we identified limitations in the security measures and capabilities of TREs to support "next-generation" requirements such as wide ranges of data types, ability to develop artificial intelligence algorithms and software within the environment, handling of big data, and timely import and export of data. CONCLUSIONS We found a lack of software or other automation tools to support the community and limited knowledge of how to meet the next-generation requirements from the research community. Disclosure control for exporting artificial intelligence algorithms and software was found to be particularly challenging, and there is a clear need for additional controls to support this capability within TREs.
Collapse
|
13
|
Developing a new Governance Approval Process to support federated discovery and meta-analysis of data across the UK through the CO-CONNECT project. Int J Popul Data Sci 2022. [DOI: 10.23889/ijpds.v7i3.1799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022] Open
Abstract
Objectives To develop a new approval process for federated data custodians to install and support a new platform which enables researchers to run from one website, instantaneous, aggregate-level queries to determine the number of patients in each dataset which meet their research criteria.
To agree security controls across data custodians which protect patient confidentiality whilst also providing this new automated capability for researchers and reducing the burden on each data custodian to manually provide the information.
ApproachThe COVID - Curated and Open aNalysis aNd rEsearCh plaTform (CO-CONNECT) has integrated a Cohort Discovery Tool into the Health Data Research (HDR) UK Innovation Gateway website and is connecting >50 different federated datasets. The underpinning architecture is novel, without precedent at such a scale in the UK. We found that although each data custodian recognised the benefits of the platform, many were unclear of the process to formally approve this new model. We have worked across data custodians to co-develop the required new processes and document the security controls.
ResultsWe found vast differences in technical knowledge and infrastructures across different data custodians, especially across small research groups hosting data on consented research cohorts verses larger organisations who host and manage routinely collected data. A model for approvals evolved for these 2 separate groups:
Consented research cohorts: a 2-stage process of a pre-assessment for the need for a DPIA and/or completed DPIA. All returned a positive outcome which deemed no personal identifiable information was being used.
Unconsented population level data: 4 different documents were required each being approved by different committees within each data custodian: DPIA, Data Access Application, Security Risk Assessment, Disclosure Control Assessment.
As the model was novel to many data custodians, we developed many different explainer videos and detailed step by step instructions.
ConclusionWe recommend a new approvals process for new technologies/models is developed to support initiatives which are not covered by the traditional data access request process. Increased investment in teams which approve data governance and IT security applications which have been overwhelmed by the increased demand for their services to review COVID-19 related projects would be welcomed.
Collapse
|
14
|
GRAIMatter: Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMatter). Int J Popul Data Sci 2022. [DOI: 10.23889/ijpds.v7i3.2005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022] Open
Abstract
ObjectivesTo assess a range of tools and methods to support Trusted Research Environments (TREs) to assess output from AI methods for potentially identifiable information, investigate the legal and ethical implications and controls, and produce a set of guidelines and recommendations to support all TREs with export controls of AI algorithms.
ApproachTREs provide secure facilities to analyse confidential personal data, with staff checking outputs for disclosure risk before publication. Artificial intelligence (AI) has high potential to improve the linking and analysis of population data, and TREs are well suited to supporting AI modelling. However, TRE governance focuses on classical statistical data analysis. The size and complexity of AI models presents significant challenges for the disclosure-checking process. Models may be susceptible to external hacking: complicated methods to reverse engineer the learning process to find out about the data used for training, with more potential to lead to re-identification than conventional statistical methods.
ResultsGRAIMatter is:
Quantitatively assessing the risk of disclosure from different AI models exploring different models, hyper-parameter settings and training algorithms over common data types
Evaluating a range of tools to determine effectiveness for disclosure control
Assessing the legal and ethical implications of TREs supporting AI development and identifying aspects of existing legal and regulatory frameworks requiring reform.
Running 4 PPIE workshops to understand their priorities and beliefs around safeguarding and securing data
Developing a set of recommendations including
suggested open-source toolsets for TREs to use to measure and reduce disclosure risk
descriptions of the technical and legal controls and policies TREs should implement across the 5 Safes to support AI algorithm disclosure control
training implications for both TRE staff and how they validate researchers
ConclusionGRAIMatter is developing a set of usable recommendations for TREs to use to guard against the additional risks when disclosing trained AI models from TREs.
Collapse
|
15
|
The COVID - Curated and Open aNalysis aNd rEsearCh plaTform (CO-CONNECT). Int J Popul Data Sci 2022. [DOI: 10.23889/ijpds.v7i3.1792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022] Open
Abstract
ObjectivesCO-CONNECT is making UK COVID-19 data Findable, Accessible, Interoperable and Reusable (FAIR) through a federated platform, which supports secure, anonymised research at scale and pace. This interdisciplinary project, spanning 22 organisations, is connecting data from >50 large research cohorts and data collected through routine healthcare provision across the UK.
ApproachAcross the UK, data has been collected that can help us answer key questions about COVID-19. As the data are in many places with many different processes it is difficult and complex for public health groups, researchers, policymakers, and government to find and access lots of high-quality data quickly and efficiently to make decisions. In collaboration with Health Data Research UK, CO-CONNECT is streamlining processes of accessing data for research.
Results1) Discovering data and meta-analysis: CO-CONNECT enables researchers to determine how many people meet their research criteria within the various datasets across the UK through the Health Data Research Innovation Gateway Cohort Discovery tool e.g. “How many people in each dataset have had a PCR test which was positive and were under the age of 40?” Only summary level, anonymous data are provided so researchers can answer such questions rapidly without requiring multiple data governance permissions and directly contacting each data source. The tool also supports aggregate level meta-analysis of the data.
2) Detailed analysis: With data governance approvals, researchers can analyse detailed level, standardised, linked, pseudonymised data in a Trusted Research Environment. The common format reduces the effort on each research project, supporting rapid research.
ConclusionProviding data in this de-identifiable, safe way enables rapid, robust research e.g., COVID-19 results from a test centre can be linked to hospital records along with prescriptions from pharmacies enabling researchers to understand whether people with different existing health conditions are more or less susceptible to COVID-19. If you want to know more visit https://co-connect.ac.uk.
Collapse
|
16
|
Towards a Scottish Safe Haven Federation. Int J Popul Data Sci 2022. [PMCID: PMC9644802 DOI: 10.23889/ijpds.v7i3.1837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
17
|
Augmenting laboratory COVID serology data granularity for SARS-CoV-2 reporting. Int J Popul Data Sci 2022. [PMCID: PMC9644826 DOI: 10.23889/ijpds.v7i3.1887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
18
|
An architecture for building cohorts of images from real-world clinical data from the whole Scottish population supporting research and AI development. Int J Popul Data Sci 2022. [PMCID: PMC9644851 DOI: 10.23889/ijpds.v7i3.1916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|
19
|
Scottish Medical Imaging Service - Technical and Governance controls. Int J Popul Data Sci 2022. [DOI: 10.23889/ijpds.v7i3.1869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022] Open
Abstract
ObjectivesThe Scottish Medical Imaging (SMI) service provides linkable, population based, “research-ready” real-world medical images for researchers to develop or validate AI algorithms within the Scottish National Safe Haven. The PICTURES research programme is developing novel methods to enhance the SMI service offering through research in cybersecurity and software/data/infrastructure engineering.
ApproachAdditional technical and governance controls were required to enable safe access to medical images.
The researcher is isolated from the rest of the trusted research environment (TRE) using a Project Private Zone (PPZ). This enables researchers to build and install their own software stack, and protects the TRE from malicious code.
Guidelines are under development for researchers on the safe development of algorithms and the expected relationship between the size of the model and the training dataset. There is associated work on the statistical disclosure control of models to enable safe release of trained models from the TRE.
ResultsA policy enabling the use of “Non-standard software” based on prior research, domain knowledge and experience gained from two contrasting research studies was developed. Additional clauses have been added to the legal control – the eDRIS User Agreement – signed by each researcher and their Head of Department. Penalties for attempting to import or use malware, remove data within models or any attempt to deceive or circumvent such controls are severe, and apply to both the individual and their institution. The process of building and deploying a PPZ has been developed allowing researchers to install their own software.
No attempt has yet been made to add additional ethical controls; however, a future service development could be validating the performance of researchers’ algorithms on our training dataset.
ConclusionThe availability to conduct research using images poses new challenges and risks for those commissioning and operating TREs. The Private Project Zone and our associated governance controls are a huge step towards supporting the needs of researchers in the 21st century.
Collapse
|
20
|
Preanalytical Pitfalls in Untargeted Plasma Nuclear Magnetic Resonance Metabolomics of Endocrine Hypertension. Metabolites 2022; 12:metabo12080679. [PMID: 35893246 PMCID: PMC9394285 DOI: 10.3390/metabo12080679] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 06/17/2022] [Accepted: 07/11/2022] [Indexed: 11/24/2022] Open
Abstract
Despite considerable morbidity and mortality, numerous cases of endocrine hypertension (EHT) forms, including primary aldosteronism (PA), pheochromocytoma and functional paraganglioma (PPGL), and Cushing’s syndrome (CS), remain undetected. We aimed to establish signatures for the different forms of EHT, investigate potentially confounding effects and establish unbiased disease biomarkers. Plasma samples were obtained from 13 biobanks across seven countries and analyzed using untargeted NMR metabolomics. We compared unstratified samples of 106 PHT patients to 231 EHT patients, including 104 PA, 94 PPGL and 33 CS patients. Spectra were subjected to a multivariate statistical comparison of PHT to EHT forms and the associated signatures were obtained. Three approaches were applied to investigate and correct confounding effects. Though we found signatures that could separate PHT from EHT forms, there were also key similarities with the signatures of sample center of origin and sample age. The study design restricted the applicability of the corrections employed. With the samples that were available, no biomarkers for PHT vs. EHT could be identified. The complexity of the confounding effects, evidenced by their robustness to correction approaches, highlighted the need for a consensus on how to deal with variabilities probably attributed to preanalytical factors in retrospective, multicenter metabolomics studies.
Collapse
|
21
|
A National Network of Safe Havens: Scottish Perspective. J Med Internet Res 2022; 24:e31684. [PMID: 35262495 PMCID: PMC8943560 DOI: 10.2196/31684] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 11/18/2021] [Accepted: 12/03/2021] [Indexed: 01/22/2023] Open
Abstract
For over a decade, Scotland has implemented and operationalized a system of Safe Havens, which provides secure analytics platforms for researchers to access linked, deidentified electronic health records (EHRs) while managing the risk of unauthorized reidentification. In this paper, a perspective is provided on the state-of-the-art Scottish Safe Haven network, including its evolution, to define the key activities required to scale the Scottish Safe Haven network's capability to facilitate research and health care improvement initiatives. A set of processes related to EHR data and their delivery in Scotland have been discussed. An interview with each Safe Haven was conducted to understand their services in detail, as well as their commonalities. The results show how Safe Havens in Scotland have protected privacy while facilitating the reuse of the EHR data. This study provides a common definition of a Safe Haven and promotes a consistent understanding among the Scottish Safe Haven network and the clinical and academic research community. We conclude by identifying areas where efficiencies across the network can be made to meet the needs of population-level studies at scale.
Collapse
|
22
|
PRE-PROCEDURAL RISK SCORES TO HELP IDENTIFY PATIENTS AT RISK OF CONTRAST INDUCED NEPHROPATHY AFTER CHRONIC TOTAL OCCLUSION PERCUTANEOUS CORONARY INTERVENTION FOR PERI-PROCEDURAL NEPHROPROTECTIVE THERAPIES. J Am Coll Cardiol 2022. [DOI: 10.1016/s0735-1097(22)01833-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
23
|
Towards nationally curated data archives for clinical radiology image analysis at scale: Learnings from national data collection in response to a pandemic. Digit Health 2021; 7:20552076211048654. [PMID: 34868617 PMCID: PMC8637703 DOI: 10.1177/20552076211048654] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 09/07/2021] [Indexed: 12/27/2022] Open
Abstract
The prevalence of the coronavirus SARS-CoV-2 disease has resulted in the
unprecedented collection of health data to support research. Historically,
coordinating the collation of such datasets on a national scale has been
challenging to execute for several reasons, including issues with data privacy,
the lack of data reporting standards, interoperable technologies, and
distribution methods. The coronavirus SARS-CoV-2 disease pandemic has
highlighted the importance of collaboration between government bodies,
healthcare institutions, academic researchers and commercial companies in
overcoming these issues during times of urgency. The National COVID-19 Chest
Imaging Database, led by NHSX, British Society of Thoracic Imaging, Royal Surrey
NHS Foundation Trust and Faculty, is an example of such a national initiative.
Here, we summarise the experiences and challenges of setting up the National
COVID-19 Chest Imaging Database, and the implications for future ambitions of
national data curation in medical imaging to advance the safe adoption of
artificial intelligence in healthcare.
Collapse
|
24
|
Erratum to: An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis. Gigascience 2021; 10:giab083. [PMID: 34850874 PMCID: PMC8634578 DOI: 10.1093/gigascience/giab083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
25
|
An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis. Gigascience 2021; 10:giab076. [PMID: 34849869 PMCID: PMC8633457 DOI: 10.1093/gigascience/giab076] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 08/04/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The National COVID-19 Chest Imaging Database (NCCID) is a centralized database containing mainly chest X-rays and computed tomography scans from patients across the UK. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and the development of machine learning technologies that will improve care for patients hospitalized with a severe COVID-19 infection. This article introduces the training dataset, including a snapshot analysis covering the completeness of clinical data, and availability of image data for the various use-cases (diagnosis, prognosis, longitudinal risk). An additional cohort analysis measures how well the NCCID represents the wider COVID-19-affected UK population in terms of geographic, demographic, and temporal coverage. FINDINGS The NCCID offers high-quality DICOM images acquired across a variety of imaging machinery; multiple time points including historical images are available for a subset of patients. This volume and variety make the database well suited to development of diagnostic/prognostic models for COVID-associated respiratory conditions. Historical images and clinical data may aid long-term risk stratification, particularly as availability of comorbidity data increases through linkage to other resources. The cohort analysis revealed good alignment to general UK COVID-19 statistics for some categories, e.g., sex, whilst identifying areas for improvements to data collection methods, particularly geographic coverage. CONCLUSION The NCCID is a growing resource that provides researchers with a large, high-quality database that can be leveraged both to support the response to the COVID-19 pandemic and as a test bed for building clinically viable medical imaging models.
Collapse
|
26
|
Desiderata for the development of next-generation electronic health record phenotype libraries. Gigascience 2021; 10:giab059. [PMID: 34508578 PMCID: PMC8434766 DOI: 10.1093/gigascience/giab059] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/15/2021] [Accepted: 08/18/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. METHODS A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. RESULTS We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. CONCLUSIONS There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.
Collapse
|
27
|
Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 2021; 49:107739. [PMID: 33794304 DOI: 10.1016/j.biotechadv.2021.107739] [Citation(s) in RCA: 215] [Impact Index Per Article: 71.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/01/2021] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
With the development of modern high-throughput omic measurement platforms, it has become essential for biomedical studies to undertake an integrative (combined) approach to fully utilise these data to gain insights into biological systems. Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms. Machine learning methods offer novel techniques to integrate and analyse the various omics data enabling the discovery of new biomarkers. These biomarkers have the potential to help in accurate disease prediction, patient stratification and delivery of precision medicine. This review paper explores different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease. It provides insight and recommendations for interdisciplinary professionals who envisage employing machine learning skills in multi-omics studies.
Collapse
|
28
|
Abstract
CONTEXT Identification of patients with endocrine forms of hypertension (EHT) (primary hyperaldosteronism [PA], pheochromocytoma/paraganglioma [PPGL], and Cushing syndrome [CS]) provides the basis to implement individualized therapeutic strategies. Targeted metabolomics (TM) have revealed promising results in profiling cardiovascular diseases and endocrine conditions associated with hypertension. OBJECTIVE Use TM to identify distinct metabolic patterns between primary hypertension (PHT) and EHT and test its discriminating ability. METHODS Retrospective analyses of PHT and EHT patients from a European multicenter study (ENSAT-HT). TM was performed on stored blood samples using liquid chromatography mass spectrometry. To identify discriminating metabolites a "classical approach" (CA) (performing a series of univariate and multivariate analyses) and a "machine learning approach" (MLA) (using random forest) were used.The study included 282 adult patients (52% female; mean age 49 years) with proven PHT (n = 59) and EHT (n = 223 with 40 CS, 107 PA, and 76 PPGL), respectively. RESULTS From 155 metabolites eligible for statistical analyses, 31 were identified discriminating between PHT and EHT using the CA and 27 using the MLA, of which 16 metabolites (C9, C16, C16:1, C18:1, C18:2, arginine, aspartate, glutamate, ornithine, spermidine, lysoPCaC16:0, lysoPCaC20:4, lysoPCaC24:0, PCaeC42:0, SM C18:1, SM C20:2) were found by both approaches. The receiver operating characteristic curve built on the top 15 metabolites from the CA provided an area under the curve (AUC) of 0.86, which was similar to the performance of the 15 metabolites from MLA (AUC 0.83). CONCLUSION TM identifies distinct metabolic pattern between PHT and EHT providing promising discriminating performance.
Collapse
|
29
|
Monitoring indirect impact of COVID-19 pandemic on services for cardiovascular diseases in the UK. Heart 2020; 106:1890-1897. [PMID: 33020224 PMCID: PMC7536637 DOI: 10.1136/heartjnl-2020-317870] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/16/2020] [Accepted: 09/18/2020] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVE To monitor hospital activity for presentation, diagnosis and treatment of cardiovascular diseases during the COVID-19) pandemic to inform on indirect effects. METHODS Retrospective serial cross-sectional study in nine UK hospitals using hospital activity data from 28 October 2019 (pre-COVID-19) to 10 May 2020 (pre-easing of lockdown) and for the same weeks during 2018-2019. We analysed aggregate data for selected cardiovascular diseases before and during the epidemic. We produced an online visualisation tool to enable near real-time monitoring of trends. RESULTS Across nine hospitals, total admissions and emergency department (ED) attendances decreased after lockdown (23 March 2020) by 57.9% (57.1%-58.6%) and 52.9% (52.2%-53.5%), respectively, compared with the previous year. Activity for cardiac, cerebrovascular and other vascular conditions started to decline 1-2 weeks before lockdown and fell by 31%-88% after lockdown, with the greatest reductions observed for coronary artery bypass grafts, carotid endarterectomy, aortic aneurysm repair and peripheral arterial disease procedures. Compared with before the first UK COVID-19 (31 January 2020), activity declined across diseases and specialties between the first case and lockdown (total ED attendances relative reduction (RR) 0.94, 0.93-0.95; total hospital admissions RR 0.96, 0.95-0.97) and after lockdown (attendances RR 0.63, 0.62-0.64; admissions RR 0.59, 0.57-0.60). There was limited recovery towards usual levels of some activities from mid-April 2020. CONCLUSIONS Substantial reductions in total and cardiovascular activities are likely to contribute to a major burden of indirect effects of the pandemic, suggesting they should be monitored and mitigated urgently.
Collapse
|
30
|
An extensible big data software architecture managing a research resource of real-world clinical radiology data linked to other health data from the whole Scottish population. Gigascience 2020; 9:giaa095. [PMID: 32990744 PMCID: PMC7523405 DOI: 10.1093/gigascience/giaa095] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 07/28/2020] [Accepted: 08/26/2020] [Indexed: 02/06/2023] Open
Abstract
AIM To enable a world-leading research dataset of routinely collected clinical images linked to other routinely collected data from the whole Scottish national population. This includes more than 30 million different radiological examinations from a population of 5.4 million and >2 PB of data collected since 2010. METHODS Scotland has a central archive of radiological data used to directly provide clinical care to patients. We have developed an architecture and platform to securely extract a copy of those data, link it to other clinical or social datasets, remove personal data to protect privacy, and make the resulting data available to researchers in a controlled Safe Haven environment. RESULTS An extensive software platform has been developed to host, extract, and link data from cohorts to answer research questions. The platform has been tested on 5 different test cases and is currently being further enhanced to support 3 exemplar research projects. CONCLUSIONS The data available are from a range of radiological modalities and scanner types and were collected under different environmental conditions. These real-world, heterogenous data are valuable for training algorithms to support clinical decision making, especially for deep learning where large data volumes are required. The resource is now available for international research access. The platform and data can support new health research using artificial intelligence and machine learning technologies, as well as enabling discovery science.
Collapse
|
31
|
Using imaging to combat a pandemic: rationale for developing the UK National COVID-19 Chest Imaging Database. Eur Respir J 2020; 56:2001809. [PMID: 32616598 PMCID: PMC7331656 DOI: 10.1183/13993003.01809-2020] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 06/08/2020] [Indexed: 12/12/2022]
Abstract
The National COVID-19 Chest Imaging Database (NCCID) is a repository of chest radiographs, CT and MRI images and clinical data from COVID-19 patients across the UK, to support research and development of AI technology and give insight into COVID-19 disease https://bit.ly/3eQeuha
Collapse
|
32
|
Knowledge Driven Phenotyping. Stud Health Technol Inform 2020; 270:1327-1328. [PMID: 32570642 DOI: 10.3233/shti200425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Extracting patient phenotypes from routinely collected health data (such as Electronic Health Records) requires translating clinically-sound phenotype definitions into queries/computations executable on the underlying data sources by clinical researchers. This requires significant knowledge and skills to deal with heterogeneous and often imperfect data. Translations are time-consuming, error-prone and, most importantly, hard to share and reproduce across different settings. This paper proposes a knowledge driven framework that (1) decouples the specification of phenotype semantics from underlying data sources; (2) can automatically populate and conduct phenotype computations on heterogeneous data spaces. We report preliminary results of deploying this framework on five Scottish health datasets.
Collapse
|
33
|
Investigating the Relationship Between Type 2 Diabetes and Dementia Using Electronic Medical Records in the GoDARTS Bioresource. Diabetes Care 2019; 42:1973-1980. [PMID: 31391202 DOI: 10.2337/dc19-0380] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 07/17/2019] [Indexed: 02/03/2023]
Abstract
OBJECTIVE To investigate the impact of type 2 diabetes on incidence of major dementia subtypes, Alzheimer and vascular dementia, using electronic medical records (EMR) in the GoDARTS bioresource. RESEARCH DESIGN AND METHODS GoDARTS (Genetics of Diabetes Audit and Research in Tayside Scotland) comprises a large case-control study of type 2 diabetes with longitudinal follow-up in EMR. Dementia case subjects after recruitment were passively identified in the EMR, and using a combination of case note review, an Alzheimer-specific weighted genetic risk score (wGRS), and APOE4 genotype, we validated major dementia subtypes. We undertook a retrospective matched cohort study to determine the risk of type 2 diabetes status for incident dementia accounting for competing risk of death. RESULTS Type 2 diabetes status was associated with a significant risk of any dementia (cause-specific hazard ratio [csHR] 1.46, 95% CI 1.31-1.64), which was attenuated, but still significant, when competing risk of death was accounted for (subdistribution [sd]HR 1.26, 95% CI 1.13-1.41). The accuracy of EMR-defined cases of Alzheimer or vascular dementia was high-positive predictive value (PPV) 86.4% and PPV 72.8%, respectively-and wGRS significantly predicted Alzheimer dementia (HR 1.23, 95% CI 1.12-1.34) but not vascular dementia (HR 1.02, 95% CI 0.91-1.15). Conversely, type 2 diabetes was strongly associated with vascular dementia (csHR 2.47, 95% C 1.92-3.18) but not Alzheimer dementia, particularly after competing risk of death was accounted for (sdHR 1.02, 95% CI 0.87-1.18). CONCLUSIONS Our study indicates that type 2 diabetes is associated with an increased risk of vascular dementia but not with an increased risk of Alzheimer dementia and highlights the potential value of bioresources linked to EMR to study dementia.
Collapse
|
34
|
The research data management platform (RDMP): A novel, process driven, open-source tool for the management of longitudinal cohorts of clinical data. Gigascience 2018; 7:5001426. [PMID: 29790950 PMCID: PMC6041881 DOI: 10.1093/gigascience/giy060] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 05/14/2018] [Indexed: 11/13/2022] Open
Abstract
Background The Health Informatics Centre at the University of Dundee provides a service to securely host clinical datasets and extract relevant data for anonymized cohorts to researchers to enable them to answer key research questions. As is common in research using routine healthcare data, the service was historically delivered using ad-hoc processes resulting in the slow provision of data whose provenance was often hidden to the researchers using it. This paper describes the development and evaluation of the Research Data Management Platform (RDMP): an open source tool to load, manage, clean, and curate longitudinal healthcare data for research and provide reproducible and updateable datasets for defined cohorts to researchers. Results Between 2013 and 2017, RDMP tool implementation tripled the productivity of data analysts producing data releases for researchers from 7.1 to 25.3 per month and reduced the error rate from 12.7% to 3.1%. The effort on data management reduced from a mean of 24.6 to 3.0 hours per data release. The waiting time for researchers to receive data after agreeing a specification reduced from approximately 6 months to less than 1 week. The software is scalable and currently manages 163 datasets. A total 1,321 data extracts for research have been produced, with the largest extract linking data from 70 different datasets. Conclusions The tools and processes that encompass the RDMP not only fulfil the research data management requirements of researchers but also support the seamless collaboration of data cleaning, data transformation, data summarization and data quality assessment activities by different research groups.
Collapse
|
35
|
Mapping Local Codes to Read Codes. Stud Health Technol Inform 2017; 234:29-36. [PMID: 28186011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
UNLABELLED Background & Objectives: Legacy laboratory test codes make it difficult to use clinical datasets for meaningful translational research, where populations are followed for disease risk and outcomes over many years. The Health Informatics Centre (HIC) at the University of Dundee hosts continuous biochemistry data from the clinical laboratories in Tayside and Fife dating back as far as 1987. However, the HIC-managed biochemistry dataset is coupled with incoherent sample types and unstandardised legacy local test codes, which increases the complexity of using the dataset for reasonable population health outcomes. The objective of this study was to map the legacy local test codes to the Scottish 5-byte Version 2 Read Codes using biochemistry data extracted from the repository of the Scottish Care Information (SCI) Store. METHODS Data mapping methodology was used to map legacy local test codes from clinical biochemistry laboratories within Tayside and Fife to the Scottish 5-byte Version 2 Read Codes. RESULTS The methodology resulted in the mapping of 485 legacy laboratory test codes, spanning 25 years, to 124 Read Codes. CONCLUSION The data mapping methodology not only facilitated the restructuring of the HIC-managed biochemistry dataset to support easier cohort identification and selection, but it also made it easier for the standardised local laboratory test codes, in the Scottish 5-byte Version 2 Read Codes, to be mapped to other health data standards such as Clinical Terms Version 3 (CTV3); LOINC; and SNOMED CT.
Collapse
|
36
|
Supporting clinical trials through healthcare informatics. Trials 2015. [PMCID: PMC4658697 DOI: 10.1186/1745-6215-16-s2-o67] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
37
|
The relationship between domain-domain interaction orientation and sequence similarity. BMC SYSTEMS BIOLOGY 2007. [DOI: 10.1186/1752-0509-1-s1-s13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
38
|
Noninvasive identification of materials inside USP vials with Raman spectroscopy and a Raman spectral library. J Pharm Sci 1998; 87:1-8. [PMID: 9452960 DOI: 10.1021/js970330q] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A commercial dispersive Raman spectrometer operating at 785 nm with a CCD detector was used to acquire spectra of USP reference materials inside amber USP vials. The laser and collection beams were directed through the bottom of the vials, resulting in a 60% loss of signal. The Raman shift was calibrated with a 4-acetamidophenol standard, and spectral response was corrected with a luminescent standard. After these corrections, the Raman spectra obtained inside the USP vial and on open powders differed by less than 5%. A spectral library of 309 reference materials was constructed, with spectral acquisition times ranging from 1 to 60 s. Of these, 8% had significant fluorescent background but observable Raman features, while 3% showed only fluorescence. A blind test of 26 unknowns revealed the accuracy of the library search to be 88-96%, depending on search algorithm, and 100% if operator discretion was permitted. The tolerance of the library search to degraded signal-to-noise ratio, resolution, and Raman shift accuracy were tested, and the search was very robust. The results demonstrate that Raman spectroscopy provides a rapid, noninvasive technique for compound identification.
Collapse
|
39
|
A study of the vacuum pyrolysis of 4-diazo-3-isochromanone with Hel ultraviolet photoelectron spectroscopy. CAN J CHEM 1995. [DOI: 10.1139/v95-213] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A newly developed ultraviolet photoelectron spectrometer – CO2 laser instrument that utilizes a 50-W CW laser as a directed heat source was used to study the vacuum pyrolysis of 4-diazo-3-isochromanone (1). Analysis of the pyrolysate with ultraviolet photoelectron spectroscopy and photoionization mass spectrometry established that 1 undergoes a facile, unexpected pyrolysis at a laser power level of 26 W yielding N2, CO, and benzocyclobutenone (6). A multistep mechanism beginning with the formation of 4-carbena-3-isochromanone (2), which rearranges to oxaketene 3, can be written for the reaction. If 3 is an intermediate, it must be unusually thermally labile for it readily decarbonylates to 2-carbena-3,4-benzotetrahydrofuran (4). The ring opening of 4 into the ortho-quininoid ketene 5 and the cyclization of 5 into 6 are possible final steps in the conversion of 1 into 6. Keywords: vacuum pyrolysis, 4-diazo-3-isochromanone, HeI ultraviolet photoelectron spectroscopy.
Collapse
|
40
|
|
41
|
Rapid High Pressure Liquid Chromatographic Determination of Amitriptyline Hydrochloride in Tablets and Injectables: Collaborative Study. J AOAC Int 1983. [DOI: 10.1093/jaoac/66.5.1196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Abstract
A previously reported high pressure liquid chromatographic method for the determination of amitriptyline hydrochloride in dosage forms was modified to permit its use as a stability-indicating method. The modified method, entailing a nitrile bonded microparticulate column, a methanol-0.005M ammonium acetate (90 + 10) mobile phase, and photometric detection at 239 nm, was collaboratively tested by 10 laboratories. Each collaborator received samples of synthetic and commercial tablets and injections. The recovery from a synthetic injection at the 10.06 mg/mL spiking level averaged 98.6%. The amount of declared found in commercial injections averaged 103.1%. The pooled reproducibility SD (CV%) and repeatability SD (CV%) were ± 2.12 (2.15) and ± 1.81 (1.84), respectively. The recovery from synthetic tablet composite at the 7.45% spiking level averaged 102.0%. The amount of declared found for commercial 25 mg and 100 mg tablets averaged 96.7 and 97.9%, respectively. The pooled reproducibility SD (CV%) and repeatability SD (CV%) for these 3 tablet samples were ± 1.89 (1.86) and ± 1.66 (1.64), respectively. Content uniformity analysis of commercial 25 mg and 100 mg tablets (n = 10) gave amounts of declared values averaging 100.5% (range 92.4-108.8%) and 99.3% (range 89.6-107.0%), respectively. The pooled reproducibility SD (CV%) and repeatability SD (CV%) were ± 3.23 (3.2) and ± 2.78 (2.8), respectively. A commercial injectable preparation spiked with dibenzosuberone was also collaboratively analyzed by a thin layer method. The method was adopted interim official first action.
Collapse
|
42
|
Aspirin--a national survey V: Determination of aspirin and impurities in enteric coated tablets and suppository formulations and in vitro dissolution of enteric coated tablets. J Pharm Sci 1982; 71:1049-52. [PMID: 7131273 DOI: 10.1002/jps.2600710923] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The results of a national survey on the quality of enteric coated aspirin tablets and aspirin suppositories are presented. The tablets were analyzed for strength, salicylic acid content, in vitro dissolution rate, and related aspirin impurities. The suppositories were analyzed for strength and salicylic acid content. The methods of analysis and validation of data are also presented.
Collapse
|
43
|
A course for potential examiners. NURSING TIMES 1979; 75:1904-5. [PMID: 259260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|