1
|
Tan ALM, Getzen EJ, Hutch MR, Strasser ZH, Gutiérrez-Sacristán A, Le TT, Dagliati A, Morris M, Hanauer DA, Moal B, Bonzel CL, Yuan W, Chiudinelli L, Das P, Zhang HG, Aronow BJ, Avillach P, Brat GA, Cai T, Hong C, La Cava WG, Hooi Will Loh H, Luo Y, Murphy SN, Yuan Hgiam K, Omenn GS, Patel LP, Jebathilagam Samayamuthu M, Shriver ER, Shakeri Hossein Abad Z, Tan BWL, Visweswaran S, Wang X, Weber GM, Xia Z, Verdy B, Long Q, Mowery DL, Holmes JH. Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record? J Biomed Inform 2023; 139:104306. [PMID: 36738870 PMCID: PMC10849195 DOI: 10.1016/j.jbi.2023.104306] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 01/21/2023] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
BACKGROUND In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.
Collapse
Affiliation(s)
| | - Emily J Getzen
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | | | - Trang T Le
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | | | | | | | | | | | | | - Priam Das
- Harvard Medical School, Cambridge, MA, USA
| | | | - Bruce J Aronow
- Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA
| | | | | | - Tianxi Cai
- Harvard Medical School, Cambridge, MA, USA
| | - Chuan Hong
- Harvard Medical School, Cambridge, MA, USA; Duke University, Durham, NC, USA
| | - William G La Cava
- Harvard Medical School, Cambridge, MA, USA; Boston Children's Hospital, Boston, MA, USA
| | | | - Yuan Luo
- Northwestern University, Chicago, IL, USA
| | | | | | | | - Lav P Patel
- University of Kansas Medical Center, United States
| | | | - Emily R Shriver
- University of Pennsylvania Health System, Philadelphia, PA, USA
| | | | | | | | - Xuan Wang
- Harvard Medical School, Cambridge, MA, USA
| | | | - Zongqi Xia
- University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Qi Long
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Danielle L Mowery
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - John H Holmes
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| |
Collapse
|
2
|
Ortuño FM, Loucera C, Casimiro-Soriguer CS, Lepe JA, Camacho Martinez P, Merino Diaz L, de Salazar A, Chueca N, García F, Perez-Florido J, Dopazo J. Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences. Gigascience 2021; 10:giab078. [PMID: 34865008 PMCID: PMC8643610 DOI: 10.1093/gigascience/giab078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 10/26/2021] [Accepted: 11/12/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The current SARS-CoV-2 pandemic has emphasized the utility of viral whole-genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which viruses are sequenced, along with the demand of urgent results, causes a high rate of incomplete and, therefore, useless sequences. Viral sequences evolve in the context of a complex phylogeny and different positions along the genome are in linkage disequilibrium. Therefore, an imputation method would be able to predict missing positions from the available sequencing data. RESULTS We have developed the impuSARS application, which takes advantage of the enormous number of SARS-CoV-2 genomes available, using a reference panel containing 239,301 sequences, to produce missing data imputation in viral genomes. ImpuSARS was tested in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing), showing great fidelity when reconstructing the original sequences, recovering the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (<20%). CONCLUSIONS Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. ImpuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 whole-genome sequencing.
Collapse
Affiliation(s)
- Francisco M Ortuño
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
| | - Carlos S Casimiro-Soriguer
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
| | - Jose A Lepe
- Unidad Clínica Enfermedades Infecciosas, Microbiología y Medicina Preventiva, Hospital Universitario Virgen del Rocío, 41013 Sevilla, Spain
| | - Pedro Camacho Martinez
- Unidad Clínica Enfermedades Infecciosas, Microbiología y Medicina Preventiva, Hospital Universitario Virgen del Rocío, 41013 Sevilla, Spain
| | - Laura Merino Diaz
- Unidad Clínica Enfermedades Infecciosas, Microbiología y Medicina Preventiva, Hospital Universitario Virgen del Rocío, 41013 Sevilla, Spain
| | - Adolfo de Salazar
- Servicio de Microbiología, Hospital Universitario San Cecilio, 18016 Granada, Spain
| | - Natalia Chueca
- Servicio de Microbiología, Hospital Universitario San Cecilio, 18016 Granada, Spain
| | - Federico García
- Servicio de Microbiología, Hospital Universitario San Cecilio, 18016 Granada, Spain
| | - Javier Perez-Florido
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
| | - Joaquin Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
- FPS/ELIXIR-es, Hospital Virgen del Rocío, Sevilla 42013, Spain
- CIBER de Enfermedades Infecciosas (CIBERINFEC), Hospital Universitario San Cecilio, 18016 Granada, Spain
| |
Collapse
|
3
|
Mosha NR, Aluko OS, Todd J, Machekano R, Young T. Analytical methods used in estimating the prevalence of HIV/AIDS from demographic and cross-sectional surveys with missing data: a systematic review. BMC Med Res Methodol 2020; 20:65. [PMID: 32171240 PMCID: PMC7071763 DOI: 10.1186/s12874-020-00944-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 02/28/2020] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Sero- prevalence studies often have a problem of missing data. Few studies report the proportion of missing data and even fewer describe the methods used to adjust the results for missing data. The objective of this review was to determine the analytical methods used for analysis in HIV surveys with missing data. METHODS We searched for population, demographic and cross-sectional surveys of HIV published from January 2000 to April 2018 in Pub Med/Medline, Web of Science core collection, Latin American and Caribbean Sciences Literature, Africa-Wide Information and Scopus, and by reviewing references of included articles. All potential abstracts were imported into Covidence and abstracts screened by two independent reviewers using pre-specified criteria. Disagreements were resolved through discussion. A piloted data extraction tool was used to extract data and assess the risk of bias of the eligible studies. Data were analysed through a quantitative approach; variables were presented and summarised using figures and tables. RESULTS A total of 3426 citations where identified, 194 duplicates removed, 3232 screened and 69 full articles were obtained. Twenty-four studies were included. The response rate for an HIV test of the included studies ranged from 32 to 96% with the major reason for the missing data being refusal to consent for an HIV test. Complete case analysis was the primary method of analysis used, multiple imputations 11(46%) was the most advanced method used, followed by the Heckman's selection model 9(38%). Single Imputation and Instrumental variables method were used in only two studies each, with 13(54%) other different methods used in several studies. Forty-two percent of the studies applied more than two methods in the analysis, with a maximum of 4 methods per study. Only 6(25%) studies conducted a sensitivity analysis, while 11(46%) studies had a significant change of estimates after adjusting for missing data. CONCLUSION Missing data in survey studies is still a problem in disease estimation. Our review outlined a number of methods that can be used to adjust for missing data on HIV studies; however, more information and awareness are needed to allow informed choices on which method to be applied for the estimates to be more reliable and representative.
Collapse
Affiliation(s)
- Neema R Mosha
- Division of Epidemiology and Biostatistics, Faculty of Medicine and Health Sciences, Stellenbosch University, P.O. Box 241, Francie van Zijl Drive, 7505 Tygerberg, Cape Town, South Africa.
- Mwanza Intervention Trials Unit, P.O. Box 11936, Isamilo road, Mwanza, Tanzania.
- National Institute for Medical Research, Mwanza Centre, P.O. Box 1462, Isamilo road, Mwanza, Tanzania.
| | - Omololu S Aluko
- Division of Epidemiology and Biostatistics, Faculty of Medicine and Health Sciences, Stellenbosch University, P.O. Box 241, Francie van Zijl Drive, 7505 Tygerberg, Cape Town, South Africa
| | - Jim Todd
- National Institute for Medical Research, Mwanza Centre, P.O. Box 1462, Isamilo road, Mwanza, Tanzania
- London School of Hygiene and Tropical Medicine, Keppel St, Bloomsbury, London, WC1E 7HT, UK
| | - Rhoderick Machekano
- Division of Epidemiology and Biostatistics, Faculty of Medicine and Health Sciences, Stellenbosch University, P.O. Box 241, Francie van Zijl Drive, 7505 Tygerberg, Cape Town, South Africa
| | - Taryn Young
- Division of Epidemiology and Biostatistics, Faculty of Medicine and Health Sciences, Stellenbosch University, P.O. Box 241, Francie van Zijl Drive, 7505 Tygerberg, Cape Town, South Africa
| |
Collapse
|
4
|
Chronic Disease Prediction Using Character-Recurrent Neural Network in The Presence of Missing Information. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9102170] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
The aim of this study was to predict chronic diseases in individual patients using a character-recurrent neural network (Char-RNN), which is a deep learning model that treats data in each class as a word when a large portion of its input values is missing. An advantage of Char-RNN is that it does not require any additional imputation method because it implicitly infers missing values considering the relationship with nearby data points. We applied Char-RNN to classify cases in the Korea National Health and Nutrition Examination Survey (KNHANES) VI as normal status and five chronic diseases: hypertension, stroke, angina pectoris, myocardial infarction, and diabetes mellitus. We also employed a multilayer perceptron network for the same task for comparison. The results show higher accuracy for Char-RNN than for the conventional multilayer perceptron model. Char-RNN showed remarkable performance in finding patients with hypertension and stroke. The present study utilized the KNHANES VI data to demonstrate a practical approach to predicting and managing chronic diseases with partially observed information.
Collapse
|
5
|
Mutenherwa F, Wassenaar DR, de Oliveira T. Experts' Perspectives on Key Ethical Issues Associated With HIV Phylogenetics as Applied in HIV Transmission Dynamics Research. J Empir Res Hum Res Ethics 2018; 14:61-77. [PMID: 30486713 DOI: 10.1177/1556264618809608] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The use of phylogenetics in HIV molecular epidemiology has considerably increased our ability to understand the origin, spread, and characteristics of HIV epidemics. Despite its potential to advance knowledge on HIV transmission dynamics, the ethical issues associated with HIV molecular epidemiology have received minimal attention. In-depth interviews were conducted with scientists from diverse backgrounds to explore their perspectives on ethical issues associated with phylogenetic analysis of HIV genetic data as applied to HIV transmission dynamics studies. The Emanuel framework was used as the analytical framework. Favorable risk-benefit ratio and informed consent were the most invoked ethical principles and fair participant selection the least. Fear of loss of privacy and disclosure of HIV transmission were invariably cited as key ethical concerns. As HIV sequence data become increasingly available, comprehensive guidelines should be developed to guide its access, sharing and use, cognizant of the potential harms that may result.
Collapse
Affiliation(s)
- Farirai Mutenherwa
- 1 University of KwaZulu-Natal, South Africa.,2 KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | | | - Tulio de Oliveira
- 1 University of KwaZulu-Natal, South Africa.,2 KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa.,3 Centre for the AIDS Programme of Research in South Africa, Durban, South Africa
| |
Collapse
|