1
|
Kuo TT, Jiang X, Tang H, Wang X, Harmanci A, Kim M, Post K, Bu D, Bath T, Kim J, Liu W, Chen H, Ohno-Machado L. The evolving privacy and security concerns for genomic data analysis and sharing as observed from the iDASH competition. J Am Med Inform Assoc 2022; 29:2182-2190. [PMID: 36164820 PMCID: PMC9667175 DOI: 10.1093/jamia/ocac165] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 08/25/2022] [Accepted: 09/13/2022] [Indexed: 01/11/2023] Open
Abstract
Concerns regarding inappropriate leakage of sensitive personal information as well as unauthorized data use are increasing with the growth of genomic data repositories. Therefore, privacy and security of genomic data have become increasingly important and need to be studied. With many proposed protection techniques, their applicability in support of biomedical research should be well understood. For this purpose, we have organized a community effort in the past 8 years through the integrating data for analysis, anonymization and sharing consortium to address this practical challenge. In this article, we summarize our experience from these competitions, report lessons learned from the events in 2020/2021 as examples, and discuss potential future research directions in this emerging field.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- Corresponding Author: Tsung-Ting Kuo, PhD, UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA;
| | | | | | | | - Arif Harmanci
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Miran Kim
- Department of Mathematics, Hanyang University, Seoul, Republic of Korea,Department of Computer Science, Hanyang University, Seoul, Republic of Korea
| | - Kai Post
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Diyue Bu
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Tyler Bath
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Jihoon Kim
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Weijie Liu
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Hongbo Chen
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA,Division of Health Services Research & Development, Veteran Affairs San Diego Healthcare System, San Diego, California, USA
| |
Collapse
|
2
|
Functional genomics data: privacy risk assessment and technological mitigation. Nat Rev Genet 2022; 23:245-258. [PMID: 34759381 DOI: 10.1038/s41576-021-00428-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2021] [Indexed: 12/15/2022]
Abstract
The generation of functional genomics data by next-generation sequencing has increased greatly in the past decade. Broad sharing of these data is essential for research advancement but poses notable privacy challenges, some of which are analogous to those that occur when sharing genetic variant data. However, there are also unique privacy challenges that arise from cryptic information leakage during the processing and summarization of functional genomics data from raw reads to derived quantities, such as gene expression values. Here, we review these challenges and present potential solutions for mitigating privacy risks while allowing broad data dissemination and analysis.
Collapse
|
3
|
Spini G, Mancini E, Attema T, Abspoel M, de Gier J, Fehr S, Veugen T, van Heesch M, Worm D, De Luca A, Cramer R, Sloot PM. New Approach to Privacy-Preserving Clinical Decision Support Systems for HIV Treatment. J Med Syst 2022; 46:84. [PMID: 36261621 PMCID: PMC9581834 DOI: 10.1007/s10916-022-01851-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 08/09/2022] [Accepted: 08/16/2022] [Indexed: 01/04/2023]
Abstract
BACKGROUND HIV treatment prescription is a complex process. Clinical decision support systems (CDSS) are a category of health information technologies that can assist clinicians to choose optimal treatments based on clinical trials and expert knowledge. The usability of some CDSSs for HIV treatment would be significantly improved by using the knowledge obtained by treating other patients. This knowledge, however, is mainly contained in patient records, whose usage is restricted due to privacy and confidentiality constraints. METHODS A treatment effectiveness measure, containing valuable information for HIV treatment prescription, was defined and a method to extract this measure from patient records was developed. This method uses an advanced cryptographic technology, known as secure Multiparty Computation (henceforth referred to as MPC), to preserve the privacy of the patient records and the confidentiality of the clinicians' decisions. FINDINGS Our solution enables to compute an effectiveness measure of an HIV treatment, the average time-to-treatment-failure, while preserving privacy. Experimental results show that our solution, although at proof-of-concept stage, has good efficiency and provides a result to a query within 24 min for a dataset of realistic size. INTERPRETATION This paper presents a novel and efficient approach HIV clinical decision support systems, that harnesses the potential and insights acquired from treatment data, while preserving the privacy of patient records and the confidentiality of clinician decisions.
Collapse
Affiliation(s)
- Gabriele Spini
- Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands
| | - Emiliano Mancini
- Institute for Advanced Study, University of Amsterdam, Oude Turfmarkt 147, 1012 GC Amsterdam, The Netherlands ,Department of Global Health, Amsterdam UMC, Location AMC, 1105 AZ Amsterdam, The Netherlands ,Data Science Institute, Hasselt University, Diepenbeek, Belgium
| | - Thomas Attema
- Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands ,Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands ,Mathematical Institute, Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands
| | - Mark Abspoel
- Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands ,Philips Research, High Tech Campus 34, 5656 AE Eindhoven, The Netherlands
| | - Jan de Gier
- Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands
| | - Serge Fehr
- Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands ,Mathematical Institute, Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands
| | - Thijs Veugen
- Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands ,Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands
| | - Maran van Heesch
- Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands
| | - Daniël Worm
- Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands
| | - Andrea De Luca
- Department of Medical Biotechnologies, University of Siena and Siena University Hospital, Viale Mario Bracci 16, 53100 Siena, Italy
| | - Ronald Cramer
- Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands ,Mathematical Institute, Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands
| | - Peter M.A. Sloot
- Institute for Advanced Study, University of Amsterdam, Oude Turfmarkt 147, 1012 GC Amsterdam, The Netherlands ,Complexity Institute, Nanyang Technological University, Academic Building North, Level 1 Section B Unit No. 7 (ABN-01B-07), 61 Nanyang Drive, 637335 Singapore, Singapore ,Advanced Computing, ITMO University, Lomonosova street 9, 191002 Saint Petersburg, Russia
| |
Collapse
|
4
|
Buchlak QD, Esmaili N, Bennett C, Farrokhi F. Natural Language Processing Applications in the Clinical Neurosciences: A Machine Learning Augmented Systematic Review. ACTA NEUROCHIRURGICA. SUPPLEMENT 2022; 134:277-289. [PMID: 34862552 DOI: 10.1007/978-3-030-85292-4_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Natural language processing (NLP), a domain of artificial intelligence (AI) that models human language, has been used in medicine to automate diagnostics, detect adverse events, support decision making and predict clinical outcomes. However, applications to the clinical neurosciences appear to be limited. NLP has matured with the implementation of deep transformer models (e.g., XLNet, BERT, T5, and RoBERTa) and transfer learning. The objectives of this study were to (1) systematically review NLP applications in the clinical neurosciences, and (2) explore NLP analysis to facilitate literature synthesis, providing clear examples to demonstrate the potential capabilities of these technologies for a clinical audience. Our NLP analysis consisted of keyword identification, text summarization and document classification. A total of 48 articles met inclusion criteria. NLP has been applied in the clinical neurosciences to facilitate literature synthesis, data extraction, patient identification, automated clinical reporting and outcome prediction. The number of publications applying NLP has increased rapidly over the past five years. Document classifiers trained to differentiate included and excluded articles demonstrated moderate performance (XLNet AUC = 0.66, BERT AUC = 0.59, RoBERTa AUC = 0.62). The T5 transformer model generated acceptable abstract summaries. The application of NLP has the potential to enhance research and practice in the clinical neurosciences.
Collapse
Affiliation(s)
- Quinlan D Buchlak
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia.
| | - Nazanin Esmaili
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia
| | - Christine Bennett
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
| | - Farrokh Farrokhi
- Neuroscience Institute, Virginia Mason Medical Center, Seattle, WA, USA
| |
Collapse
|
5
|
Kuo TT, Bath T, Ma S, Pattengale N, Yang M, Cao Y, Hudson CM, Kim J, Post K, Xiong L, Ohno-Machado L. Benchmarking blockchain-based gene-drug interaction data sharing methods: A case study from the iDASH 2019 secure genome analysis competition blockchain track. Int J Med Inform 2021; 154:104559. [PMID: 34474309 PMCID: PMC9933142 DOI: 10.1016/j.ijmedinf.2021.104559] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 07/24/2021] [Accepted: 07/27/2021] [Indexed: 01/11/2023]
Abstract
BACKGROUND Blockchain distributed ledger technology is just starting to be adopted in genomics and healthcare applications. Despite its increased prevalence in biomedical research applications, skepticism regarding the practicality of blockchain technology for real-world problems is still strong and there are few implementations beyond proof-of-concept. We focus on benchmarking blockchain strategies applied to distributed methods for sharing records of gene-drug interactions. We expect this type of sharing will expedite personalized medicine. BASIC PROCEDURES We generated gene-drug interaction test datasets using the Clinical Pharmacogenetics Implementation Consortium (CPIC) resource. We developed three blockchain-based methods to share patient records on gene-drug interactions: Query Index, Index Everything, and Dual-Scenario Indexing. MAIN FINDINGS We achieved a runtime of about 60 s for importing 4,000 gene-drug interaction records from four sites, and about 0.5 s for a data retrieval query. Our results demonstrated that it is feasible to leverage blockchain as a new platform to share data among institutions. PRINCIPAL CONCLUSIONS We show the benchmarking results of novel blockchain-based methods for institutions to share patient outcomes related to gene-drug interactions. Our findings support blockchain utilization in healthcare, genomic and biomedical applications. The source code is publicly available at https://github.com/tsungtingkuo/genedrug.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.
| | - Tyler Bath
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Shuaicheng Ma
- Department of Computer Science, Emory University, Atlanta, GA, USA
| | | | - Meng Yang
- BGI-Shenzhen, Shenzhen, Guangdong, China,Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Yao Cao
- Department of Social Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, Japan
| | | | - Jihoon Kim
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Kai Post
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Li Xiong
- Department of Computer Science, Emory University, Atlanta, GA, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA,Division of Health Services Research & Development, VA San Diego Healthcare System, San Diego, CA, USA
| |
Collapse
|
6
|
Callahan A, Polony V, Posada JD, Banda JM, Gombar S, Shah NH. ACE: the Advanced Cohort Engine for searching longitudinal patient records. J Am Med Inform Assoc 2021; 28:1468-1479. [PMID: 33712854 PMCID: PMC8279796 DOI: 10.1093/jamia/ocab027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 02/23/2021] [Indexed: 01/02/2023] Open
Abstract
OBJECTIVE To propose a paradigm for a scalable time-aware clinical data search, and to describe the design, implementation and use of a search engine realizing this paradigm. MATERIALS AND METHODS The Advanced Cohort Engine (ACE) uses a temporal query language and in-memory datastore of patient objects to provide a fast, scalable, and expressive time-aware search. ACE accepts data in the Observational Medicine Outcomes Partnership Common Data Model, and is configurable to balance performance with compute cost. ACE's temporal query language supports automatic query expansion using clinical knowledge graphs. The ACE API can be used with R, Python, Java, HTTP, and a Web UI. RESULTS ACE offers an expressive query language for complex temporal search across many clinical data types with multiple output options. ACE enables electronic phenotyping and cohort-building with subsecond response times in searching the data of millions of patients for a variety of use cases. DISCUSSION ACE enables fast, time-aware search using a patient object-centric datastore, thereby overcoming many technical and design shortcomings of relational algebra-based querying. Integrating electronic phenotype development with cohort-building enables a variety of high-value uses for a learning health system. Tradeoffs include the need to learn a new query language and the technical setup burden. CONCLUSION ACE is a tool that combines a unique query language for time-aware search of longitudinal patient records with a patient object datastore for rapid electronic phenotyping, cohort extraction, and exploratory data analyses.
Collapse
Affiliation(s)
- Alison Callahan
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - Vladimir Polony
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - José D Posada
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | - Saurabh Gombar
- Department of Pathology, School of Medicine, Stanford University, Stanford, California, USA
| | - Nigam H Shah
- Center for Biomedical Informatics Research, School of Medicine, School of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
7
|
Forsch N, Govil S, Perry JC, Hegde S, Young AA, Omens JH, McCulloch AD. Computational analysis of cardiac structure and function in congenital heart disease: Translating discoveries to clinical strategies. JOURNAL OF COMPUTATIONAL SCIENCE 2021; 52:101211. [PMID: 34691293 PMCID: PMC8528218 DOI: 10.1016/j.jocs.2020.101211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Increased availability and access to medical image data has enabled more quantitative approaches to clinical diagnosis, prognosis, and treatment planning for congenital heart disease. Here we present an overview of long-term clinical management of tetralogy of Fallot (TOF) and its intersection with novel computational and data science approaches to discovering biomarkers of functional and prognostic importance. Efforts in translational medicine that seek to address the clinical challenges associated with cardiovascular diseases using personalized and precision-based approaches are then discussed. The considerations and challenges of translational cardiovascular medicine are reviewed, and examples of digital platforms with collaborative, cloud-based, and scalable design are provided.
Collapse
Affiliation(s)
- Nickolas Forsch
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Sachin Govil
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - James C Perry
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Sanjeet Hegde
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Alistair A Young
- Department of Biomedical Engineering, King’s College London, London, UK
- Department of Anatomy and Medical Imaging, University of Auckland, Auckland, NZ
| | - Jeffrey H Omens
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Deparment of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Andrew D McCulloch
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Deparment of Medicine, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
8
|
Kuo TT, Gabriel RA, Cidambi KR, Ohno-Machado L. EXpectation Propagation LOgistic REgRession on permissioned blockCHAIN (ExplorerChain): decentralized online healthcare/genomics predictive model learning. J Am Med Inform Assoc 2021; 27:747-756. [PMID: 32364235 PMCID: PMC7309256 DOI: 10.1093/jamia/ocaa023] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 02/11/2020] [Accepted: 02/24/2020] [Indexed: 11/19/2022] Open
Abstract
Objective Predicting patient outcomes using healthcare/genomics data is an increasingly popular/important area. However, some diseases are rare and require data from multiple institutions to construct generalizable models. To address institutional data protection policies, many distributed methods keep the data locally but rely on a central server for coordination, which introduces risks such as a single point of failure. We focus on providing an alternative based on a decentralized approach. We introduce the idea using blockchain technology for this purpose, with a brief description of its own potential advantages/disadvantages. Materials and Methods We explain how our proposed EXpectation Propagation LOgistic REgRession on Permissioned blockCHAIN (ExplorerChain) can achieve the same results when compared to a distributed model that uses a central server on 3 healthcare/genomic datasets, and what trade-offs need to be considered when using centralized/decentralized methods. We explain how the use of blockchain technology can help decrease some of the problems encountered in decentralized methods. Results We showed that the discrimination power of ExplorerChain can be statistically similar to its counterpart central server-based algorithm. While ExplorerChain inherited some benefits of blockchain, it had a small increased running time. Discussion ExplorerChain has the same prerequisites as a distributed model with a centralized server for coordination. In a manner similar to secure multi-party computation strategies, it assumes that participating institutions are honest, but “curious.” Conclusion When evaluated on relatively small datasets, results suggest that ExplorerChain, which combines artificial intelligence and blockchain technologies, performs as well as a central server-based method, and may avoid some risks at the cost of efficiency.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Rodney A Gabriel
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA.,Department of Anesthesiology, University of California San Diego, San Diego, California, USA
| | - Krishna R Cidambi
- Department of Orthopaedic Surgery, University of California at San Diego, San Diego, California, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA.,Division of Health Services Research & Development, VA San Diego Healthcare System, San Diego, California, USA
| |
Collapse
|
9
|
Kuo TT, Kim J, Gabriel RA. Privacy-preserving model learning on a blockchain network-of-networks. J Am Med Inform Assoc 2021; 27:343-354. [PMID: 31943009 PMCID: PMC7025358 DOI: 10.1093/jamia/ocz214] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/04/2019] [Accepted: 12/02/2019] [Indexed: 01/07/2023] Open
Abstract
Objective To facilitate clinical/genomic/biomedical research, constructing generalizable predictive models using cross-institutional methods while protecting privacy is imperative. However, state-of-the-art methods assume a “flattened” topology, while real-world research networks may consist of “network-of-networks” which can imply practical issues including training on small data for rare diseases/conditions, prioritizing locally trained models, and maintaining models for each level of the hierarchy. In this study, we focus on developing a hierarchical approach to inherit the benefits of the privacy-preserving methods, retain the advantages of adopting blockchain, and address practical concerns on a research network-of-networks. Materials and Methods We propose a framework to combine level-wise model learning, blockchain-based model dissemination, and a novel hierarchical consensus algorithm for model ensemble. We developed an example implementation HierarchicalChain (hierarchical privacy-preserving modeling on blockchain), evaluated it on 3 healthcare/genomic datasets, as well as compared its predictive correctness, learning iteration, and execution time with a state-of-the-art method designed for flattened network topology. Results HierarchicalChain improves the predictive correctness for small training datasets and provides comparable correctness results with the competing method with higher learning iteration and similar per-iteration execution time, inherits the benefits of the privacy-preserving learning and advantages of blockchain technology, and immutable records models for each level. Discussion HierarchicalChain is independent of the core privacy-preserving learning method, as well as of the underlying blockchain platform. Further studies are warranted for various types of network topology, complex data, and privacy concerns. Conclusion We demonstrated the potential of utilizing the information from the hierarchical network-of-networks topology to improve prediction.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Jihoon Kim
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Rodney A Gabriel
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA.,Department of Anesthesiology, University of California San Diego, San Diego, California, USA
| |
Collapse
|
10
|
Amirmahani F, Ebrahimi N, Molaei F, Faghihkhorasani F, Jamshidi Goharrizi K, Mirtaghi SM, Borjian‐Boroujeni M, Hamblin MR. Approaches for the integration of big data in translational medicine: single‐cell and computational methods. Ann N Y Acad Sci 2021; 1493:3-28. [DOI: 10.1111/nyas.14544] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 10/31/2020] [Accepted: 11/12/2020] [Indexed: 12/11/2022]
Affiliation(s)
- Farzane Amirmahani
- Genetics Division, Department of Cell and Molecular Biology and Microbiology, Faculty of Science and Technology University of Isfahan Isfahan Iran
| | - Nasim Ebrahimi
- Genetics Division, Department of Cell and Molecular Biology and Microbiology, Faculty of Science and Technology University of Isfahan Isfahan Iran
| | - Fatemeh Molaei
- Department of Anesthesiology, Faculty of Paramedical Jahrom University of Medical Sciences Jahrom Iran
| | | | | | | | | | - Michael R. Hamblin
- Laser Research Centre, Faculty of Health Science University of Johannesburg South Africa
| |
Collapse
|
11
|
Al-Ebbini L, Khabour OF, Alzoubi KH, Alkaraki AK. Biomedical Data Sharing Among Researchers: A Study from Jordan. J Multidiscip Healthc 2020; 13:1669-1676. [PMID: 33262602 PMCID: PMC7695599 DOI: 10.2147/jmdh.s284294] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 10/22/2020] [Indexed: 12/02/2022] Open
Abstract
Background Data sharing is an encouraged practice to support research in all fields. For that purpose, it is important to examine perceptions and concerns of researchers about biomedical data sharing, which was investigated in the current study. Methods This is a cross-sectional survey study that was distributed among biomedical researchers in Jordan, as an example of developing countries. The study survey consisted of questions about demographics and about respondent’s attitudes toward sharing of biomedical data. Results Among study participants, 46.9% (n=82) were positive regarding making their research data available to the public, whereas 53.1% refused the idea. The reasons for refusing to publicly share their data included “lack of regulations” (33.5%), “access to research data should be limited to the research team” (29.5%), “no place to deposit the data” (6.5%), and “lack of funding for data deposition” (6.0%). Agreement with the idea of making data available was associated with academic rank (P=0.003). Moreover, gender (P-value=0.043) and number of publications (P-value=0.005) were associated with a time frame for data sharing (ie, agreeing to share data before vs after publication). Conclusion About half of the respondents reported a positive attitude toward biomedical data sharing. Proper regulations and facilitation data deposition can enhance data sharing in Jordan.
Collapse
Affiliation(s)
- Lina Al-Ebbini
- Department of Biomedical Systems and Informatics Engineering, Hijjawi for Engineering Technology, Yarmouk University, Irbid 21163, Jordan
| | - Omar F Khabour
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, Jordan University of Science and Technology, Irbid 22110, Jordan
| | - Karem H Alzoubi
- Department of Clinical Pharmacy, Jordan University of Science and Technology, Irbid 22110, Jordan
| | - Almuthanna K Alkaraki
- Department of Biological Sciences, Faculty of Science, Yarmouk University, Irbid 21163, Jordan
| |
Collapse
|
12
|
Geleijnse G, Chiang RCJ, Sieswerda M, Schuurman M, Lee KC, van Soest J, Dekker A, Lee WC, Verbeek XAAM. Prognostic factors analysis for oral cavity cancer survival in the Netherlands and Taiwan using a privacy-preserving federated infrastructure. Sci Rep 2020; 10:20526. [PMID: 33239719 PMCID: PMC7688977 DOI: 10.1038/s41598-020-77476-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 11/09/2020] [Indexed: 11/24/2022] Open
Abstract
The difference in incidence of oral cavity cancer (OCC) between Taiwan and the Netherlands is striking. Different risk factors and treatment expertise may result in survival differences between the two countries. However due to regulatory restrictions, patient-level analyses of combined data from the Netherlands and Taiwan are infeasible. We implemented a software infrastructure for federated analyses on data from multiple organisations. We included 41,633 patients with single-tumour OCC between 2004 and 2016, undergoing surgery, from the Taiwan Cancer Registry and Netherlands Cancer Registry. Federated Cox Proportional Hazard was used to analyse associations between patient and tumour characteristics, country, treatment and hospital volume with survival. Five factors showed differential effects on survival of OCC patients in the Netherlands and Taiwan: age at diagnosis, stage, grade, treatment and hospital volume. The risk of death for OCC patients younger than 60 years, with advanced stage, higher grade or receiving adjuvant therapy after surgery was lower in the Netherlands than in Taiwan; but patients older than 70 years, with early stage, lower grade and receiving surgery alone in the Netherlands were at higher risk of death than those in Taiwan. The mortality risk of OCC in Taiwanese patients treated in hospitals with higher hospital volume (≥ 50 surgeries per year) was lower than in Dutch patients. We conducted analyses without exchanging patient-level information, overcoming barriers for sharing privacy sensitive information. The outcomes of patients treated in the Netherlands and Taiwan were slightly different after controlling for other prognostic factors.
Collapse
Affiliation(s)
- Gijs Geleijnse
- Netherlands Comprehensive Cancer Organisation (IKNL), Godebaldkwartier 419, 3511 DT, Utrecht, The Netherlands.
| | - RuRu Chun-Ju Chiang
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University and Taiwan Cancer Registry, Taipei, Taiwan
| | - Melle Sieswerda
- Netherlands Comprehensive Cancer Organisation (IKNL), Godebaldkwartier 419, 3511 DT, Utrecht, The Netherlands
| | - Melinda Schuurman
- Netherlands Comprehensive Cancer Organisation (IKNL), Godebaldkwartier 419, 3511 DT, Utrecht, The Netherlands
| | - K C Lee
- Biomedical Technology and Device Research Laboratories, Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan
| | - Johan van Soest
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre, Maastricht, The Netherlands
| | - Andre Dekker
- Department of Radiation Oncology (MAASTRO), GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre, Maastricht, The Netherlands
| | - Wen-Chung Lee
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University and Taiwan Cancer Registry, Taipei, Taiwan
| | - Xander A A M Verbeek
- Netherlands Comprehensive Cancer Organisation (IKNL), Godebaldkwartier 419, 3511 DT, Utrecht, The Netherlands
| |
Collapse
|
13
|
Kuo TT. The anatomy of a distributed predictive modeling framework: online learning, blockchain network, and consensus algorithm. JAMIA Open 2020; 3:201-208. [PMID: 32734160 PMCID: PMC7382618 DOI: 10.1093/jamiaopen/ooaa017] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 04/21/2020] [Accepted: 04/29/2020] [Indexed: 11/23/2022] Open
Abstract
Objective Cross-institutional distributed healthcare/genomic predictive modeling is an emerging technology that fulfills both the need of building a more generalizable model and of protecting patient data by only exchanging the models but not the patient data. In this article, the implementation details are presented for one specific blockchain-based approach, ExplorerChain, from a software development perspective. The healthcare/genomic use cases of myocardial infarction, cancer biomarker, and length of hospitalization after surgery are also described. Materials and Methods ExplorerChain’s 3 main technical components, including online machine learning, metadata of transaction, and the Proof-of-Information-Timed (PoINT) algorithm, are introduced in this study. Specifically, the 3 algorithms (ie, core, new network, and new site/data) are described in detail. Results ExplorerChain was implemented and the design details of it were illustrated, especially the development configurations in a practical setting. Also, the system architecture and programming languages are introduced. The code was also released in an open source repository available at https://github.com/tsungtingkuo/explorerchain. Discussion The designing considerations of semi-trust assumption, data format normalization, and non-determinism was discussed. The limitations of the implementation include fixed-number participating sites, limited join-or-leave capability during initialization, advanced privacy technology yet to be included, and further investigation in ethical, legal, and social implications. Conclusion This study can serve as a reference for the researchers who would like to implement and even deploy blockchain technology. Furthermore, the off-the-shelf software can also serve as a cornerstone to accelerate the development and investigation of future healthcare/genomic blockchain studies.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
14
|
Conboy C. Consent and Privacy in the Era of Precision Medicine and Biobanking Genomic Data. AMERICAN JOURNAL OF LAW & MEDICINE 2020; 46:167-187. [PMID: 32659188 DOI: 10.1177/0098858820933493] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
"Big Data represents a challenge that points to the need for collective and political approaches to self-protection rather than solely individual, atomistic approaches."- Anita Allen, "Protecting One's Own Privacy in a Big Data Economy".
Collapse
|
15
|
Kuo TT, Gabriel RA, Ohno-Machado L. Fair compute loads enabled by blockchain: sharing models by alternating client and server roles. J Am Med Inform Assoc 2020; 26:392-403. [PMID: 30892656 PMCID: PMC7787356 DOI: 10.1093/jamia/ocy180] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 10/16/2018] [Accepted: 12/02/2018] [Indexed: 11/28/2022] Open
Abstract
Objective Decentralized privacy-preserving predictive modeling enables multiple institutions to learn a more generalizable model on healthcare or genomic data by sharing the partially trained models instead of patient-level data, while avoiding risks such as single point of control. State-of-the-art blockchain-based methods remove the “server” role but can be less accurate than models that rely on a server. Therefore, we aim at developing a general model sharing framework to preserve predictive correctness, mitigate the risks of a centralized architecture, and compute the models in a fair way Materials and Methods We propose a framework that includes both server and “client” roles to preserve correctness. We adopt a blockchain network to obtain the benefits of decentralization, by alternating the roles for each site to ensure computational fairness. Also, we developed GloreChain (Grid Binary LOgistic REgression on Permissioned BlockChain) as a concrete example, and compared it to a centralized algorithm on 3 healthcare or genomic datasets to evaluate predictive correctness, number of learning iterations and execution time Results GloreChain performs exactly the same as the centralized method in terms of correctness and number of iterations. It inherits the advantages of blockchain, at the cost of increased time to reach a consensus model Discussion Our framework is general or flexible and can also address intrinsic challenges of blockchain networks. Further investigations will focus on higher-dimensional datasets, additional use cases, privacy-preserving quality concerns, and ethical, legal, and social implications Conclusions Our framework provides a promising potential for institutions to learn a predictive model based on healthcare or genomic data in a privacy-preserving and decentralized way.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA
| | - Rodney A Gabriel
- UCSD Health Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA.,Department of Anesthesiology, University of California, San Diego, San Diego, California, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA.,Division of Health Services Research & Development, VA San Diego Healthcare System, La Jolla, California, USA
| |
Collapse
|
16
|
Chehab K, Kalboussi A, Hadj Kacem A. Study of Healthcare Professionals’ Interaction in the Patient Records Based on Annotations. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7313271 DOI: 10.1007/978-3-030-51517-1_28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The annotation practice is an almost daily activity; it is used by healthcare professionals (PHC) to analyze, collaborate, share knowledge and communicate, between them, information present in the healthcare record of patients. These annotations are created in a healthcare cycle that consists of: diagnosis, treatment, advice, follow-up and observation.
Due to an exponential increase in the number of medical annotation systems that are used by different categories of health professionals, we are faced with a problem of lack of organization of medical annotation systems developed on the basis of formal criteria. As a result, we have a fragmented image of these annotations tools which make the mission of choice of an annotation system by a PHC, in a well-defined context (biology, radiology…) and according to their needs to the functionalities offered by these tools, are difficult.
In this article we present a classification of thirty annotation tools developed by industry and academia based on 5 generic criteria. We conclude this survey paper with model proposition.
Collapse
|
17
|
Esmaeilzadeh P, Mirzaei T. The Potential of Blockchain Technology for Health Information Exchange: Experimental Study From Patients' Perspectives. J Med Internet Res 2019; 21:e14184. [PMID: 31223119 PMCID: PMC6610459 DOI: 10.2196/14184] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 05/12/2019] [Accepted: 05/20/2019] [Indexed: 01/22/2023] Open
Abstract
Background Nowadays, a number of mechanisms and tools are being used by health care organizations and physicians to electronically exchange the personal health information of patients. The main objectives of different methods of health information exchange (HIE) are to reduce health care costs, minimize medical errors, and improve the coordination of interorganizational information exchange across health care entities. The main challenges associated with the common HIE systems are privacy concerns, security risks, low visibility of system transparency, and lack of patient control. Blockchain technology is likely to disrupt the current information exchange models utilized in the health care industry. Objective Little is known about patients’ perceptions and attitudes toward the implementation of blockchain-enabled HIE networks, and it is still not clear if patients (as one of the main HIE stakeholders) are likely to opt in to the applications of this technology in HIE initiatives. Thus, this study aimed at exploring the core value of blockchain technology in the health care industry from health care consumers’ views. Methods To recognize the potential applications of blockchain technology in health care practices, we designed 16 information exchange scenarios for controlled Web-based experiments. Overall, 2013 respondents participated in 16 Web-based experiments. Each experiment described an information exchange condition characterized by 4 exchange mechanisms (ie, direct, lookup, patient-centered, and blockchain), 2 types of health information (ie, sensitive vs nonsensitive), and 2 types of privacy policy (weak vs strong). Results The findings show that there are significant differences in patients’ perceptions of various exchange mechanisms with regard to patient privacy concern, trust in competency and integrity, opt-in intention, and willingness to share information. Interestingly, participants hold a favorable attitude toward the implementation of blockchain-based exchange mechanisms for privacy protection, coordination, and information exchange purposes. This study proposed the potentials and limitations of a blockchain-based attempt in the HIE context. Conclusions The results of this research should be of interest to both academics and practitioners. The findings propose potential limitations of a blockchain-based HIE that should be addressed by health care organizations to exchange personal health information in a secure and private manner. This study can contribute to the research in the blockchain area and enrich the literature on the use of blockchain in HIE efforts. Practitioners can also identify how to leverage the benefit of blockchain to promote HIE initiatives nationwide.
Collapse
Affiliation(s)
- Pouyan Esmaeilzadeh
- Department of Information Systems and Business Analytics, College of Business, Florida International University, Modesto A Maidique Campus, Miami, FL, United States
| | - Tala Mirzaei
- Department of Information Systems and Business Analytics, College of Business, Florida International University, Modesto A Maidique Campus, Miami, FL, United States
| |
Collapse
|
18
|
Gu W, Yildirimman R, Van der Stuyft E, Verbeeck D, Herzinger S, Satagopam V, Barbosa-Silva A, Schneider R, Lange B, Lehrach H, Guo Y, Henderson D, Rowe A. Data and knowledge management in translational research: implementation of the eTRIKS platform for the IMI OncoTrack consortium. BMC Bioinformatics 2019; 20:164. [PMID: 30935364 PMCID: PMC6444691 DOI: 10.1186/s12859-019-2748-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 03/18/2019] [Indexed: 01/04/2023] Open
Abstract
Background For large international research consortia, such as those funded by the European Union’s Horizon 2020 programme or the Innovative Medicines Initiative, good data coordination practices and tools are essential for the successful collection, organization and analysis of the resulting data. Research consortia are attempting ever more ambitious science to better understand disease, by leveraging technologies such as whole genome sequencing, proteomics, patient-derived biological models and computer-based systems biology simulations. Results The IMI eTRIKS consortium is charged with the task of developing an integrated knowledge management platform capable of supporting the complexity of the data generated by such research programmes. In this paper, using the example of the OncoTrack consortium, we describe a typical use case in translational medicine. The tranSMART knowledge management platform was implemented to support data from observational clinical cohorts, drug response data from cell culture models and drug response data from mouse xenograft tumour models. The high dimensional (omics) data from the molecular analyses of the corresponding biological materials were linked to these collections, so that users could browse and analyse these to derive candidate biomarkers. Conclusions In all these steps, data mapping, linking and preparation are handled automatically by the tranSMART integration platform. Therefore, researchers without specialist data handling skills can focus directly on the scientific questions, without spending undue effort on processing the data and data integration, which are otherwise a burden and the most time-consuming part of translational research data analysis. Electronic supplementary material The online version of this article (10.1186/s12859-019-2748-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Gu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | | | | | - Sascha Herzinger
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Adriano Barbosa-Silva
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Bodo Lange
- Alacris Theranostics GmbH, Berlin, Germany
| | - Hans Lehrach
- Alacris Theranostics GmbH, Berlin, Germany.,Max Planck Institute for Molecular Genetics, Berlin, Germany.,Dahlem Centre for Genome Research and Medical Systems Biology, Berlin, Germany
| | - Yike Guo
- Data Science Institute, Imperial College London, London, UK
| | | | - Anthony Rowe
- Janssen Research and Development Ltd, High Wycombe, UK.
| | | |
Collapse
|
19
|
Yoshida K, Gruber S, Fireman BH, Toh S. Comparison of privacy-protecting analytic and data-sharing methods: A simulation study. Pharmacoepidemiol Drug Saf 2018; 27:1034-1041. [PMID: 30022561 PMCID: PMC6135666 DOI: 10.1002/pds.4615] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2017] [Revised: 04/09/2018] [Accepted: 06/11/2018] [Indexed: 11/06/2022]
Abstract
PURPOSE Privacy-protecting analytic and data-sharing methods that minimize the disclosure risk of sensitive information are increasingly important due to the growing interest in utilizing data across multiple sources. We conducted a simulation study to examine how avoiding sharing individual-level data in a distributed data network can affect analytic results. METHODS The base scenario had four sites of varying sizes with 5% outcome incidence, 50% treatment prevalence, and seven confounders. We varied treatment prevalence, outcome incidence, treatment effect, site size, number of sites, and covariate distribution. Confounding adjustment was conducted using propensity score or disease risk score. We compared analyses of three types of aggregate-level data requested from sites: risk-set, summary-table, or effect-estimate data (meta-analysis) with benchmark results of analysis of pooled individual-level data. We assessed bias and precision of hazard ratio estimates as well as the accuracy of standard error estimates. RESULTS All the aggregate-level data-sharing approaches, regardless of confounding adjustment methods, successfully approximated pooled individual-level data analysis in most simulation scenarios. Meta-analysis showed minor bias when using inverse probability of treatment weights (IPTW) in infrequent exposure (5%), rare outcome (0.01%), and small site (5,000 patients) settings. SE estimates became less accurate for IPTW risk-set approach with less frequent exposure and for propensity score-matching meta-analysis approach with rare outcomes. CONCLUSIONS Overall, we found that we can avoid sharing individual-level data and obtain valid results in many settings, although care must be taken with meta-analysis approach in infrequent exposure and rare outcome scenarios, particularly when confounding adjustment is performed with IPTW.
Collapse
Affiliation(s)
- Kazuki Yoshida
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Susan Gruber
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | - Bruce H Fireman
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
| |
Collapse
|
20
|
Vaidya J, Shafiq B, Asani M, Adam N, Jiang X, Ohno-Machado L. A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:1695-1704. [PMID: 29854240 PMCID: PMC5977652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Big data coupled with precision medicine has the potential to significantly improve our understanding and treatment of complex disorders, such as cancer, diabetes, depression, etc. However, the essential problem is that data are stuck in silos, and it is difficult to precisely identify which data would be relevant and useful for any particular type of analysis. While the process to acquire and access biomedical data requires significant effort, in many cases the data may not provide much insight to the problem at hand. Therefore, there is a need to be able to measure the utility/relevance of additional datasets for a particular biomedical research task without direct access to the data. Towards this, in this paper, we develop a privacy-preserving approach to create synthetic data that can provide a firstorder approximation of utility. We evaluate the proposed approach with several biomedical datasets in the context of regression and classification tasks and discuss how it can be incorporated into existing data management systems such as REDCap.
Collapse
Affiliation(s)
| | - Basit Shafiq
- Lahore University of Management Sciences, Lahore, Punjab, Pakistan
| | - Muazzam Asani
- Lahore University of Management Sciences, Lahore, Punjab, Pakistan
| | | | - Xiaoqian Jiang
- University of California at San Diego, La Jolla, CA, USA
| | | |
Collapse
|
21
|
Dankar FK, Ptitsyn A, Dankar SK. The development of large-scale de-identified biomedical databases in the age of genomics-principles and challenges. Hum Genomics 2018; 12:19. [PMID: 29636096 PMCID: PMC5894154 DOI: 10.1186/s40246-018-0147-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 03/15/2018] [Indexed: 12/24/2022] Open
Abstract
Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.
Collapse
Affiliation(s)
| | - Andrey Ptitsyn
- Gloucester Marine Genomics Institute, Gloucester, MA, USA
| | - Samar K Dankar
- Faculty of Sciences, University of Balamand, Souk El Ghareb, Lebanon
| |
Collapse
|
22
|
Abstract
The volume of genomics and health data is growing rapidly, driven by sequencing for both research and clinical use. However, under current practices, the data is fragmented into many distinct datasets, and researchers must go through a separate application process for each dataset. This is time-consuming both for the researchers and the data stewards, and it reduces the velocity of research and new discoveries that could improve human health. We propose to simplify this process, by introducing a standard Library Card that identifies and authenticates researchers across all participating datasets. Each researcher would only need to apply once to establish their bona fides as a qualified researcher, and could then use the Library Card to access a wide range of datasets that use a compatible data access policy and authentication protocol.
Collapse
|
23
|
Kuo TT, Kim HE, Ohno-Machado L. Blockchain distributed ledger technologies for biomedical and health care applications. J Am Med Inform Assoc 2018; 24:1211-1220. [PMID: 29016974 PMCID: PMC6080687 DOI: 10.1093/jamia/ocx068] [Citation(s) in RCA: 260] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 06/30/2017] [Indexed: 11/16/2022] Open
Abstract
Objectives To introduce blockchain technologies, including their benefits, pitfalls, and the latest applications, to the biomedical and health care domains. Target Audience Biomedical and health care informatics researchers who would like to learn about blockchain technologies and their applications in the biomedical/health care domains. Scope The covered topics include: (1) introduction to the famous Bitcoin crypto-currency and the underlying blockchain technology; (2) features of blockchain; (3) review of alternative blockchain technologies; (4) emerging nonfinancial distributed ledger technologies and applications; (5) benefits of blockchain for biomedical/health care applications when compared to traditional distributed databases; (6) overview of the latest biomedical/health care applications of blockchain technologies; and (7) discussion of the potential challenges and proposed solutions of adopting blockchain technologies in biomedical/health care domains.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Hyeon-Eui Kim
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.,Division of Health Services Research and Development, Veterans Administration San Diego Healthcare System, La Jolla, CA, USA
| |
Collapse
|
24
|
Wei W, Ji Z, He Y, Zhang K, Ha Y, Li Q, Ohno-Machado L. Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4939515. [PMID: 29688374 PMCID: PMC5861401 DOI: 10.1093/database/bay017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 01/30/2018] [Indexed: 01/28/2023]
Abstract
The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline
Collapse
Affiliation(s)
- Wei Wei
- University of California, San Diego, 9500 Gilman Drive, MC 0728, La Jolla, CA 92093-0728, USA
| | - Zhanglong Ji
- University of California, San Diego, 9500 Gilman Drive, MC 0728, La Jolla, CA 92093-0728, USA
| | - Yupeng He
- University of California, San Diego, 9500 Gilman Drive, MC 0728, La Jolla, CA 92093-0728, USA
| | - Kai Zhang
- University of California, San Diego, 9500 Gilman Drive, MC 0728, La Jolla, CA 92093-0728, USA
| | - Yuanchi Ha
- University of California, San Diego, 9500 Gilman Drive, MC 0728, La Jolla, CA 92093-0728, USA
| | - Qi Li
- Department of Computer Science, Northern Kentucky University, Nunn Drive Highland Heights, KY 41099, USA
| | - Lucila Ohno-Machado
- University of California, San Diego, 9500 Gilman Drive, MC 0728, La Jolla, CA 92093-0728, USA
| |
Collapse
|
25
|
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018; 77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 316] [Impact Index Per Article: 52.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]
Abstract
BACKGROUND With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.
Collapse
Affiliation(s)
- Yanshan Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Majid Rastegar-Mojarad
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Feichen Shen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Naveed Afzal
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sijia Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Yuqun Zeng
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Saeed Mehrabi
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.
| |
Collapse
|
26
|
Christoph J, Knell C, Bosserhoff A, Naschberger E, Stürzl M, Rübner M, Seuss H, Ruh M, Prokosch HU, Sedlmayr B. Usability and Suitability of the Omics-Integrating Analysis Platform tranSMART for Translational Research and Education. Appl Clin Inform 2017; 8:1173-1183. [PMID: 29270954 DOI: 10.4338/aci-2017-05-ra-0085] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Platforms like tranSMART assist researchers in analyzing clinical and corresponding omics data. Usability is an important, yet often overlooked, factor affecting the adoption and meaningful use. Analyses on the specific needs of translational researchers and considerations about the application of such platforms for education are rare. OBJECTIVES The aim of this study was to test whether tranSMART can be used in education and how well medical students and professional researchers can handle it; to identify which kind of translational researchers-in terms of skills, experienced limitations, and available data-can take advantage of tranSMART; and to evaluate the usability and to generate recommendations for improvements. METHODS An online-based test has been done by medical students (N = 109) and researchers (N = 26). The test comprised 13 tasks in the context of four typical research scenarios based on experimental and clinical data. A web questionnaire was provided to identify both the needs and the conditions of research as well as to evaluate the system's usability based on the "System Usability Scale" (SUS). RESULTS Students and researchers were able to handle tranSMART well and coped with most scenarios: cohort identification, data exploration, hypothesis generation, and hypothesis validation were answered with a rate of correctness between 82 and 100%. Of the total, 72.2% of the teaching researchers considered tranSMART suitable for their lessons and 84.6% of the researchers considered the platform useful for their daily work; 65.4% of the researchers named the nonavailability of a platform like tranSMART as a restriction on their research. The usability was rated "acceptable" with a SUS of 70.8. CONCLUSION tranSMART is potentially suitable for education purposes and fits most of the needs of translational researchers. Improvements are needed on the presentation of analysis results and on the guidance of users through the analysis, especially to ensure the compliance of the analysis with the requirements of statistical testing.
Collapse
Affiliation(s)
- J Christoph
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - C Knell
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - A Bosserhoff
- Institute of Biochemistry (Emil-Fischer-Center), Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - E Naschberger
- Division of Molecular and Experimental Surgery, Department of Surgery, Translational Research Center Erlangen, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - M Stürzl
- Division of Molecular and Experimental Surgery, Department of Surgery, Translational Research Center Erlangen, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - M Rübner
- Department of Gynecology and Obstetrics, Comprehensive Cancer Center Erlangen-EMN, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - H Seuss
- Department of Radiology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - M Ruh
- Department of Experimental Medicine 1, Nikolaus-Fiebiger-Center for Molecular Medicine, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - H-U Prokosch
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - B Sedlmayr
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
27
|
Martin-Sanchez FJ, Aguiar-Pulido V, Lopez-Campos GH, Peek N, Sacchi L. Secondary Use and Analysis of Big Data Collected for Patient Care. Yearb Med Inform 2017; 26:28-37. [PMID: 28480474 PMCID: PMC6239231 DOI: 10.15265/iy-2017-008] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Objectives: To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different areas of application, namely clinical research, genomic research, study of environmental factors, and population and health services research. This paper describes some of the informatics methods and Big Data resources developed in this context, such as electronic phenotyping, clinical research networks, biorepositories, screening data banks, and wide association studies. Lastly, some of the potential limitations of these approaches are discussed, focusing on confounding factors and data quality. Methods: A series of literature searches in main bibliographic databases have been conducted in order to assess the extent to which existing patient data has been repurposed for research. This contribution from the IMIA working group on "Data mining and Big Data analytics" focuses on the literature published during the last two years, covering the timeframe since the working group's last survey. Results and Conclusions: Although most of the examples of secondary use of patient data lie in the arena of clinical and health services research, we have started to witness other important applications, particularly in the area of genomic research and the study of health effects of environmental factors. Further research is needed to characterize the economic impact of secondary use across the broad spectrum of translational research.
Collapse
Affiliation(s)
- F. J. Martin-Sanchez
- Weill Cornell Medicine, Department of Healthcare Policy and Research, Division of Health Informatics, New York, USA
| | - V. Aguiar-Pulido
- Weill Cornell Medicine, Brain and Mind Research Institute, New York, USA
| | - G. H. Lopez-Campos
- The University of Melbourne, Health & Biomedical Informatics Centre, Melbourne, Australia
| | - N. Peek
- MRC Health e-Research Centre, Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - L. Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
28
|
Jagodnik KM, Koplev S, Jenkins SL, Ohno-Machado L, Paten B, Schurer SC, Dumontier M, Verborgh R, Bui A, Ping P, McKenna NJ, Madduri R, Pillai A, Ma'ayan A. Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the Commons Framework Pilots workshop. J Biomed Inform 2017; 71:49-57. [PMID: 28501646 DOI: 10.1016/j.jbi.2017.05.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Revised: 05/01/2017] [Accepted: 05/08/2017] [Indexed: 12/11/2022]
Abstract
The volume and diversity of data in biomedical research have been rapidly increasing in recent years. While such data hold significant promise for accelerating discovery, their use entails many challenges including: the need for adequate computational infrastructure, secure processes for data sharing and access, tools that allow researchers to find and integrate diverse datasets, and standardized methods of analysis. These are just some elements of a complex ecosystem that needs to be built to support the rapid accumulation of these data. The NIH Big Data to Knowledge (BD2K) initiative aims to facilitate digitally enabled biomedical research. Within the BD2K framework, the Commons initiative is intended to establish a virtual environment that will facilitate the use, interoperability, and discoverability of shared digital objects used for research. The BD2K Commons Framework Pilots Working Group (CFPWG) was established to clarify goals and work on pilot projects that address existing gaps toward realizing the vision of the BD2K Commons. This report reviews highlights from a two-day meeting involving the BD2K CFPWG to provide insights on trends and considerations in advancing Big Data science for biomedical research in the United States.
Collapse
Affiliation(s)
- Kathleen M Jagodnik
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Simon Koplev
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Sherry L Jenkins
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92083, USA; Health Services Research, San Diego Veterans Administration Health System, San Diego, CA 92083, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St., Santa Cruz, CA 95060, USA
| | - Stephan C Schurer
- Department of Molecular and Cellular Pharmacology, University of Miami, 331461120 NW 14th Street, CRB 650 (M-857), Miami, FL 33136, USA
| | - Michel Dumontier
- Institute for Data Science, Universiteit Maastricht, Minderbroedersberg 4-6, 6211 LK Maastricht, Netherlands
| | - Ruben Verborgh
- Ghent University - iMinds Research Foundation Flanders, St. Pietersnieuwstraat 33, 9000 Gent, Belgium
| | - Alex Bui
- Department of Radiological Sciences, UCLA School of Medicine, Los Angeles, CA 90095, USA; Department of Bioengineering, UCLA Henri Samueli School of Engineering, Los Angeles, CA 90095, USA
| | - Peipei Ping
- Departments of Physiology, Medicine, and Bioinformatics, UCLA School of Medicine, Los Angeles, CA 90095, USA
| | - Neil J McKenna
- Department of Molecular and Cellular Biology, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| | - Ravi Madduri
- Department of Mathematics and Computer Science, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439, USA
| | - Ajay Pillai
- Division of Genome Sciences, National Human Genome Research Institute, National Institutes of Health, 31 Center Drive, MSC 2152, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, BD2K-LINCS Data Coordination and Integration Center, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1215, New York, NY 10029, USA.
| |
Collapse
|
29
|
|
30
|
Chen X, Fann YC, McAuliffe M, Vismer D, Yang R. Checking Questionable Entry of Personally Identifiable Information Encrypted by One-Way Hash Transformation. JMIR Med Inform 2017; 5:e2. [PMID: 28213343 PMCID: PMC5336604 DOI: 10.2196/medinform.5054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 07/04/2016] [Accepted: 07/04/2016] [Indexed: 11/26/2022] Open
Abstract
Background As one of the several effective solutions for personal privacy protection, a global unique identifier (GUID) is linked with hash codes that are generated from combinations of personally identifiable information (PII) by a one-way hash algorithm. On the GUID server, no PII is permitted to be stored, and only GUID and hash codes are allowed. The quality of PII entry is critical to the GUID system. Objective The goal of our study was to explore a method of checking questionable entry of PII in this context without using or sending any portion of PII while registering a subject. Methods According to the principle of GUID system, all possible combination patterns of PII fields were analyzed and used to generate hash codes, which were stored on the GUID server. Based on the matching rules of the GUID system, an error-checking algorithm was developed using set theory to check PII entry errors. We selected 200,000 simulated individuals with randomly-planted errors to evaluate the proposed algorithm. These errors were placed in the required PII fields or optional PII fields. The performance of the proposed algorithm was also tested in the registering system of study subjects. Results There are 127,700 error-planted subjects, of which 114,464 (89.64%) can still be identified as the previous one and remaining 13,236 (10.36%, 13,236/127,700) are discriminated as new subjects. As expected, 100% of nonidentified subjects had errors within the required PII fields. The possibility that a subject is identified is related to the count and the type of incorrect PII field. For all identified subjects, their errors can be found by the proposed algorithm. The scope of questionable PII fields is also associated with the count and the type of the incorrect PII field. The best situation is to precisely find the exact incorrect PII fields, and the worst situation is to shrink the questionable scope only to a set of 13 PII fields. In the application, the proposed algorithm can give a hint of questionable PII entry and perform as an effective tool. Conclusions The GUID system has high error tolerance and may correctly identify and associate a subject even with few PII field errors. Correct data entry, especially required PII fields, is critical to avoiding false splits. In the context of one-way hash transformation, the questionable input of PII may be identified by applying set theory operators based on the hash codes. The count and the type of incorrect PII fields play an important role in identifying a subject and locating questionable PII fields.
Collapse
Affiliation(s)
- Xianlai Chen
- Institute of Information Security and Big Data, Central South University, Changsha, China
| | - Yang C Fann
- Intramural IT and Bioinformatics Program, Division of Intramural, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - Matthew McAuliffe
- Division of Computational Science, Center for Information Technology, National Institutes of Health, Bethesda, MD, United States
| | - David Vismer
- Sapient Government Services, Arlington, VA, United States
| | - Rong Yang
- 7th Ward, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
31
|
Garvin JH, Kalsy M, Brandt C, Luther SL, Divita G, Coronado G, Redd D, Christensen C, Hill B, Kelly N, Treitler QZ. An Evolving Ecosystem for Natural Language Processing in Department of Veterans Affairs. J Med Syst 2017; 41:32. [PMID: 28050745 DOI: 10.1007/s10916-016-0681-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2016] [Accepted: 12/22/2016] [Indexed: 11/26/2022]
Abstract
In an ideal clinical Natural Language Processing (NLP) ecosystem, researchers and developers would be able to collaborate with others, undertake validation of NLP systems, components, and related resources, and disseminate them. We captured requirements and formative evaluation data from the Veterans Affairs (VA) Clinical NLP Ecosystem stakeholders using semi-structured interviews and meeting discussions. We developed a coding rubric to code interviews. We assessed inter-coder reliability using percent agreement and the kappa statistic. We undertook 15 interviews and held two workshop discussions. The main areas of requirements related to; design and functionality, resources, and information. Stakeholders also confirmed the vision of the second generation of the Ecosystem and recommendations included; adding mechanisms to better understand terms, measuring collaboration to demonstrate value, and datasets/tools to navigate spelling errors with consumer language, among others. Stakeholders also recommended capability to: communicate with developers working on the next version of the VA electronic health record (VistA Evolution), provide a mechanism to automatically monitor download of tools and to automatically provide a summary of the downloads to Ecosystem contributors and funders. After three rounds of coding and discussion, we determined the percent agreement of two coders to be 97.2% and the kappa to be 0.7851. The vision of the VA Clinical NLP Ecosystem met stakeholder needs. Interviews and discussion provided key requirements that inform the design of the VA Clinical NLP Ecosystem.
Collapse
Affiliation(s)
- Jennifer H Garvin
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA.
- GRECC SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA.
- Division of Epidemiology, University of Utah School of Medicine, 295 Chipeta Way, Salt Lake City, UT, 84132, USA.
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way, Ste. 140, Salt Lake City, UT, 84108, USA.
| | - Megha Kalsy
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way, Ste. 140, Salt Lake City, UT, 84108, USA
| | - Cynthia Brandt
- VA Connecticut Healthcare System, 950 Campbell Avenue, West Haven, CT, USA
- Yale School of Medicine, 333 Cedar St., New Haven, CT, USA
| | - Stephen L Luther
- James A Haley Veterans Hospital, 13000 Bruce B. Downs Blvd, Tampa, FL, USA
| | - Guy Divita
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way, Ste. 140, Salt Lake City, UT, 84108, USA
| | - Gregory Coronado
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA
| | - Doug Redd
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way, Ste. 140, Salt Lake City, UT, 84108, USA
- Department of Clinical Research and Leadership, George Washington University School of Medicine and Health Sciences, 2100 Pennsylvania Ave, NW, Washington, DC, 20037, USA
| | - Carrie Christensen
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way, Ste. 140, Salt Lake City, UT, 84108, USA
| | - Brent Hill
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way, Ste. 140, Salt Lake City, UT, 84108, USA
| | - Natalie Kelly
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA
| | - Qing Zeng Treitler
- IDEAS Center SLC VA Healthcare System, 500 Foothill Drive, Salt Lake City, UT, 84148, USA
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way, Ste. 140, Salt Lake City, UT, 84108, USA
- Department of Clinical Research and Leadership, George Washington University School of Medicine and Health Sciences, 2100 Pennsylvania Ave, NW, Washington, DC, 20037, USA
| |
Collapse
|
32
|
Skolariki K, Avramouli A. The Use of Translational Research Platforms in Clinical and Biomedical Data Exploration. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 988:301-311. [PMID: 28971409 DOI: 10.1007/978-3-319-56246-9_25] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The rise of precision medicine combined with the variety of biomedical data sources and their heterogeneous nature make the integration and exploration of information that they retain more complicated. In light of these issues, translational research platforms were developed as a promising solution. Research centers have used translational tools for the study of integrated data for hypothesis development and validation, cohort discovery and data-exploration. For this article, we reviewed the literature in order to determine the use of translational research platforms in precision medicine. These tools are used to support scientists in various domains regarding precision medicine research. We identified eight platforms: BRISK, iCOD, iDASH, tranSMART, the recently developed OncDRS, as well as caTRIP, cBio Cancer Portal and G-DOC. The last four platforms explore multidimensional data specifically for cancer research. We focused on tranSMART, for it is the most broadly used platform, since its development in 2012.
Collapse
|
33
|
Satagopam V, Gu W, Eifes S, Gawron P, Ostaszewski M, Gebel S, Barbosa-Silva A, Balling R, Schneider R. Integration and Visualization of Translational Medicine Data for Better Understanding of Human Diseases. BIG DATA 2016; 4:97-108. [PMID: 27441714 PMCID: PMC4932659 DOI: 10.1089/big.2015.0057] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Translational medicine is a domain turning results of basic life science research into new tools and methods in a clinical environment, for example, as new diagnostics or therapies. Nowadays, the process of translation is supported by large amounts of heterogeneous data ranging from medical data to a whole range of -omics data. It is not only a great opportunity but also a great challenge, as translational medicine big data is difficult to integrate and analyze, and requires the involvement of biomedical experts for the data processing. We show here that visualization and interoperable workflows, combining multiple complex steps, can address at least parts of the challenge. In this article, we present an integrated workflow for exploring, analysis, and interpretation of translational medicine data in the context of human health. Three Web services-tranSMART, a Galaxy Server, and a MINERVA platform-are combined into one big data pipeline. Native visualization capabilities enable the biomedical experts to get a comprehensive overview and control over separate steps of the workflow. The capabilities of tranSMART enable a flexible filtering of multidimensional integrated data sets to create subsets suitable for downstream processing. A Galaxy Server offers visually aided construction of analytical pipelines, with the use of existing or custom components. A MINERVA platform supports the exploration of health and disease-related mechanisms in a contextualized analytical visualization system. We demonstrate the utility of our workflow by illustrating its subsequent steps using an existing data set, for which we propose a filtering scheme, an analytical pipeline, and a corresponding visualization of analytical results. The workflow is available as a sandbox environment, where readers can work with the described setup themselves. Overall, our work shows how visualization and interfacing of big data processing services facilitate exploration, analysis, and interpretation of translational medicine data.
Collapse
Affiliation(s)
- Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Serge Eifes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
- Information Technology for Translational Medicine (ITTM) S.A., Esch-Belval, Luxembourg
| | - Piotr Gawron
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Stephan Gebel
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Adriano Barbosa-Silva
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Rudi Balling
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-Belval, Luxembourg
| |
Collapse
|
34
|
Rance B, Canuel V, Countouris H, Laurent-Puig P, Burgun A. Integrating Heterogeneous Biomedical Data for Cancer Research: the CARPEM infrastructure. Appl Clin Inform 2016; 7:260-74. [PMID: 27437039 PMCID: PMC4941838 DOI: 10.4338/aci-2015-09-ra-0125] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 02/07/2016] [Indexed: 01/19/2023] Open
Abstract
Cancer research involves numerous disciplines. The multiplicity of data sources and their heterogeneous nature render the integration and the exploration of the data more and more complex. Translational research platforms are a promising way to assist scientists in these tasks. In this article, we identify a set of scientific and technical principles needed to build a translational research platform compatible with ethical requirements, data protection and data-integration problems. We describe the solution adopted by the CARPEM cancer research program to design and deploy a platform able to integrate retrospective, prospective, and day-to-day care data. We designed a three-layer architecture composed of a data collection layer, a data integration layer and a data access layer. We leverage a set of open-source resources including i2b2 and tranSMART.
Collapse
Affiliation(s)
- Bastien Rance
- University Hospital Georges Pompidou, Paris, France; INSERM UMR_S 1138, CRC, Paris, France
| | | | - Hector Countouris
- University Hospital Georges Pompidou, Paris, France; INSERM UMR_S 1138, CRC, Paris, France
| | - Pierre Laurent-Puig
- University Hospital Georges Pompidou, Paris, France; Université Paris Sorbonne Cité, Inserm UMR-S 1147, Paris, France
| | - Anita Burgun
- University Hospital Georges Pompidou, Paris, France; INSERM UMR_S 1138, CRC, Paris, France
| |
Collapse
|
35
|
Doan S, Maehara CK, Chaparro JD, Lu S, Liu R, Graham A, Berry E, Hsu CN, Kanegaye JT, Lloyd DD, Ohno-Machado L, Burns JC, Tremoulet AH. Building a Natural Language Processing Tool to Identify Patients With High Clinical Suspicion for Kawasaki Disease from Emergency Department Notes. Acad Emerg Med 2016; 23:628-36. [PMID: 26826020 DOI: 10.1111/acem.12925] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Revised: 11/29/2015] [Accepted: 12/30/2015] [Indexed: 11/26/2022]
Abstract
OBJECTIVE Delayed diagnosis of Kawasaki disease (KD) may lead to serious cardiac complications. We sought to create and test the performance of a natural language processing (NLP) tool, the KD-NLP, in the identification of emergency department (ED) patients for whom the diagnosis of KD should be considered. METHODS We developed an NLP tool that recognizes the KD diagnostic criteria based on standard clinical terms and medical word usage using 22 pediatric ED notes augmented by Unified Medical Language System vocabulary. With high suspicion for KD defined as fever and three or more KD clinical signs, KD-NLP was applied to 253 ED notes from children ultimately diagnosed with either KD or another febrile illness. We evaluated KD-NLP performance against ED notes manually reviewed by clinicians and compared the results to a simple keyword search. RESULTS KD-NLP identified high-suspicion patients with a sensitivity of 93.6% and specificity of 77.5% compared to notes manually reviewed by clinicians. The tool outperformed a simple keyword search (sensitivity = 41.0%; specificity = 76.3%). CONCLUSIONS KD-NLP showed comparable performance to clinician manual chart review for identification of pediatric ED patients with a high suspicion for KD. This tool could be incorporated into the ED electronic health record system to alert providers to consider the diagnosis of KD. KD-NLP could serve as a model for decision support for other conditions in the ED.
Collapse
Affiliation(s)
- Son Doan
- Department of Biomedical Informatics; University of California; San Diego CA
| | - Cleo K. Maehara
- Department of Biomedical Informatics; University of California; San Diego CA
| | - Juan D. Chaparro
- Department of Biomedical Informatics; University of California; San Diego CA
| | - Sisi Lu
- Department of Computer Science; University of Pittsburgh; Pittsburgh PA
| | - Ruiling Liu
- The University of Texas Health Science Center at Houston; Houston TX
| | | | - Erika Berry
- Department of Pediatrics; University of California at San Diego; La Jolla CA
| | - Chun-Nan Hsu
- Department of Biomedical Informatics; University of California; San Diego CA
| | - John T. Kanegaye
- Department of Pediatrics; University of California at San Diego; La Jolla CA
- Rady Children's Hospital San Diego; San Diego CA
| | - David D. Lloyd
- Children's Healthcare of Atlanta; Atlanta GA
- Emory University School of Medicine; Atlanta GA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics; University of California; San Diego CA
| | - Jane C. Burns
- Department of Pediatrics; University of California at San Diego; La Jolla CA
- Rady Children's Hospital San Diego; San Diego CA
| | - Adriana H. Tremoulet
- Department of Pediatrics; University of California at San Diego; La Jolla CA
- Rady Children's Hospital San Diego; San Diego CA
| | | |
Collapse
|
36
|
Quintana Y. Challenges to Implementation of Global Translational Collaboration Platforms. MOJ PROTEOMICS & BIOINFORMATICS 2016; 2:65. [PMID: 26798845 PMCID: PMC4717481 DOI: 10.15406/mojpb.2015.02.00065] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Translational Collaboration Platforms connect clinical, genomics, and patient-reported data for the advancement of biomedical research, providing an opportunity to speed up the translating of basic science findings into clinical applications and new medicines. These platforms bring together data from both clinical and research databases and provide opportunities for multi-disciplinary research. Recent years have seen a significant growth of these platforms and some global collaborations research networks have been established using these platforms. In this brief summary of these platforms, we examine the challenges in implementation for global international research collaborations and challenges for the sustainability of research networks.
Collapse
Affiliation(s)
- Yuri Quintana
- Global Health Informatics, Beth Israel Deaconess Medical Center, USA
| |
Collapse
|
37
|
Tremoulet AH, Dutkowski J, Sato Y, Kanegaye JT, Ling XB, Burns JC. Novel data-mining approach identifies biomarkers for diagnosis of Kawasaki disease. Pediatr Res 2015; 78:547-53. [PMID: 26237629 PMCID: PMC4628575 DOI: 10.1038/pr.2015.137] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 04/17/2015] [Indexed: 11/30/2022]
Abstract
BACKGROUND As Kawasaki disease (KD) shares many clinical features with other more common febrile illnesses and misdiagnosis, leading to a delay in treatment, increases the risk of coronary artery damage, a diagnostic test for KD is urgently needed. We sought to develop a panel of biomarkers that could distinguish between acute KD patients and febrile controls (FC) with sufficient accuracy to be clinically useful. METHODS Plasma samples were collected from three independent cohorts of FC and acute KD patients who met the American Heart Association definition for KD and presented within the first 10 d of fever. The levels of 88 biomarkers associated with inflammation were assessed by Luminex bead technology. Unsupervised clustering followed by supervised clustering using a Random Forest model was used to find a panel of candidate biomarkers. RESULTS A panel of biomarkers commonly available in the hospital laboratory (absolute neutrophil count, erythrocyte sedimentation rate, alanine aminotransferase, γ-glutamyl transferase, concentrations of α-1-antitrypsin, C-reactive protein, and fibrinogen, and platelet count) accurately diagnosed 81-96% of KD patients in a series of three independent cohorts. CONCLUSION After prospective validation, this eight-biomarker panel may improve the recognition of KD.
Collapse
Affiliation(s)
- Adriana H. Tremoulet
- Pediatrics, University of California San Diego, La Jolla, California, USA,Rady Children's Hospital San Diego, San Diego, California, USA
| | | | - Yuichiro Sato
- Pediatrics, University of California San Diego, La Jolla, California, USA,Rady Children's Hospital San Diego, San Diego, California, USA
| | - John T. Kanegaye
- Pediatrics, University of California San Diego, La Jolla, California, USA,Rady Children's Hospital San Diego, San Diego, California, USA
| | | | - Jane C. Burns
- Pediatrics, University of California San Diego, La Jolla, California, USA,Rady Children's Hospital San Diego, San Diego, California, USA
| | | |
Collapse
|
38
|
Lu CL, Wang S, Ji Z, Wu Y, Xiong L, Jiang X, Ohno-Machado L. WebDISCO: a web service for distributed cox model learning without patient-level data sharing. J Am Med Inform Assoc 2015; 22:1212-9. [PMID: 26159465 PMCID: PMC5009917 DOI: 10.1093/jamia/ocv083] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 05/16/2015] [Accepted: 05/26/2015] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE The Cox proportional hazards model is a widely used method for analyzing survival data. To achieve sufficient statistical power in a survival analysis, it usually requires a large amount of data. Data sharing across institutions could be a potential workaround for providing this added power. METHODS AND MATERIALS The authors develop a web service for distributed Cox model learning (WebDISCO), which focuses on the proof-of-concept and algorithm development for federated survival analysis. The sensitive patient-level data can be processed locally and only the less-sensitive intermediate statistics are exchanged to build a global Cox model. Mathematical derivation shows that the proposed distributed algorithm is identical to the centralized Cox model. RESULTS The authors evaluated the proposed framework at the University of California, San Diego (UCSD), Emory, and Duke. The experimental results show that both distributed and centralized models result in near-identical model coefficients with differences in the range [Formula: see text] to [Formula: see text]. The results confirm the mathematical derivation and show that the implementation of the distributed model can achieve the same results as the centralized implementation. LIMITATION The proposed method serves as a proof of concept, in which a publicly available dataset was used to evaluate the performance. The authors do not intend to suggest that this method can resolve policy and engineering issues related to the federated use of institutional data, but they should serve as evidence of the technical feasibility of the proposed approach.Conclusions WebDISCO (Web-based Distributed Cox Regression Model; https://webdisco.ucsd-dbmi.org:8443/cox/) provides a proof-of-concept web service that implements a distributed algorithm to conduct distributed survival analysis without sharing patient level data.
Collapse
Affiliation(s)
- Chia-Lun Lu
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Shuang Wang
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Zhanglong Ji
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Yuan Wu
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Li Xiong
- Department of Mathematics & Computer Science, Emory University, Atlanta, GA 30322, USA. Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , , Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA , , , ,
| |
Collapse
|
39
|
Noor AM, Holmberg L, Gillett C, Grigoriadis A. Big Data: the challenge for small research groups in the era of cancer genomics. Br J Cancer 2015; 113:1405-12. [PMID: 26492224 PMCID: PMC4815885 DOI: 10.1038/bjc.2015.341] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 08/04/2015] [Accepted: 08/09/2015] [Indexed: 01/06/2023] Open
Abstract
In the past decade, cancer research has seen an increasing trend towards high-throughput techniques and translational approaches. The increasing availability of assays that utilise smaller quantities of source material and produce higher volumes of data output have resulted in the necessity for data storage solutions beyond those previously used. Multifactorial data, both large in sample size and heterogeneous in context, needs to be integrated in a standardised, cost-effective and secure manner. This requires technical solutions and administrative support not normally financially accounted for in small- to moderate-sized research groups. In this review, we highlight the Big Data challenges faced by translational research groups in the precision medicine era; an era in which the genomes of over 75 000 patients will be sequenced by the National Health Service over the next 3 years to advance healthcare. In particular, we have looked at three main themes of data management in relation to cancer research, namely (1) cancer ontology management, (2) IT infrastructures that have been developed to support data management and (3) the unique ethical challenges introduced by utilising Big Data in research.
Collapse
Affiliation(s)
- Aisyah Mohd Noor
- Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK
| | - Lars Holmberg
- Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK.,Department of Surgical Sciences, Uppsala University, Uppsala 751 85, Sweden
| | - Cheryl Gillett
- Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK.,Faculty of Life Sciences and Medicine, King's Health Partners Cancer Biobank, King's College London, Research Oncology, Guy's Hospital, London SE1 9RT, UK
| | - Anita Grigoriadis
- Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK.,Breast Cancer Now Research Unit, Research Oncology, Faculty of Life Sciences and Medicine, King's College London, Guy's Hospital, London SE1 9RT, UK
| |
Collapse
|
40
|
Wang S, Zhang Y, Dai W, Lauter K, Kim M, Tang Y, Xiong H, Jiang X. HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics 2015; 32:211-8. [PMID: 26446135 DOI: 10.1093/bioinformatics/btv563] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 09/22/2015] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual's privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. RESULTS We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. AVAILABILITY AND IMPLEMENTATION Download HEALER at http://research.ucsd-dbmi.org/HEALER/ CONTACT: shw070@ucsd.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuang Wang
- Department of Biomedical Informatics, University of California, San Diego, CA 92093
| | - Yuchen Zhang
- Department of Biomedical Informatics, University of California, San Diego, CA 92093, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Wenrui Dai
- Department of Biomedical Informatics, University of California, San Diego, CA 92093, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | | | - Miran Kim
- Seoul National University, Seoul, 151-742, Republic of Korea and
| | - Yuzhe Tang
- Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY 13244, USA
| | - Hongkai Xiong
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California, San Diego, CA 92093
| |
Collapse
|
41
|
Kho AN, Cashy JP, Jackson KL, Pah AR, Goel S, Boehnke J, Humphries JE, Kominers SD, Hota BN, Sims SA, Malin BA, French DD, Walunas TL, Meltzer DO, Kaleba EO, Jones RC, Galanter WL. Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J Am Med Inform Assoc 2015; 22:1072-80. [PMID: 26104741 PMCID: PMC5009931 DOI: 10.1093/jamia/ocv038] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Revised: 02/25/2015] [Accepted: 03/26/2015] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE To design and implement a tool that creates a secure, privacy preserving linkage of electronic health record (EHR) data across multiple sites in a large metropolitan area in the United States (Chicago, IL), for use in clinical research. METHODS The authors developed and distributed a software application that performs standardized data cleaning, preprocessing, and hashing of patient identifiers to remove all protected health information. The application creates seeded hash code combinations of patient identifiers using a Health Insurance Portability and Accountability Act compliant SHA-512 algorithm that minimizes re-identification risk. The authors subsequently linked individual records using a central honest broker with an algorithm that assigns weights to hash combinations in order to generate high specificity matches. RESULTS The software application successfully linked and de-duplicated 7 million records across 6 institutions, resulting in a cohort of 5 million unique records. Using a manually reconciled set of 11 292 patients as a gold standard, the software achieved a sensitivity of 96% and a specificity of 100%, with a majority of the missed matches accounted for by patients with both a missing social security number and last name change. Using 3 disease examples, it is demonstrated that the software can reduce duplication of patient records across sites by as much as 28%. CONCLUSIONS Software that standardizes the assignment of a unique seeded hash identifier merged through an agreed upon third-party honest broker can enable large-scale secure linkage of EHR data for epidemiologic and public health research. The software algorithm can improve future epidemiologic research by providing more comprehensive data given that patients may make use of multiple healthcare systems.
Collapse
Affiliation(s)
- Abel N Kho
- Department of Medicine, and Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - John P Cashy
- Department of Medicine, and Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA Department of Veterans Affairs, Pittsburgh PA
| | - Kathryn L Jackson
- Department of Medicine, and Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Adam R Pah
- Department of Medicine, and Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Satyender Goel
- Department of Medicine, and Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Jörn Boehnke
- Department of Economics, University of Chicago, Chicago, IL, USA
| | | | - Scott Duke Kominers
- Society of Fellows Department of Economics, Business School, Program For Evolutionary Dynamics, and Center for Research on Computation and Society, Harvard University, Cambridge, MA, USA
| | - Bala N Hota
- Department of Medicine, Rush University Medical Center, Chicago, IL, USA
| | - Shannon A Sims
- Department of Medicine, Rush University Medical Center, Chicago, IL, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, School of Medicine, and Department of Electrical Engineering and Computer Science, School of Engineering, Vanderbilt University, Nashville, TN, USA
| | - Dustin D French
- Center for Healthcare Studies and Department of Ophthalmology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Theresa L Walunas
- Department of Medicine, and Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | - Erin O Kaleba
- Alliance of Chicago Community Health Services, Chicago, IL, USA
| | - Roderick C Jones
- Formerly of Chicago Department of Public Health, currently at Ann and Robert H. Lurie Children's Hospital, Chicago, IL, USA
| | - William L Galanter
- University of Illinois Hospital and Health Sciences System, Chicago, IL, USA
| |
Collapse
|
42
|
Dyke SOM, Cheung WA, Joly Y, Ammerpohl O, Lutsik P, Rothstein MA, Caron M, Busche S, Bourque G, Rönnblom L, Flicek P, Beck S, Hirst M, Stunnenberg H, Siebert R, Walter J, Pastinen T. Epigenome data release: a participant-centered approach to privacy protection. Genome Biol 2015; 16:142. [PMID: 26185018 PMCID: PMC4504083 DOI: 10.1186/s13059-015-0723-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 07/09/2015] [Indexed: 11/10/2022] Open
Abstract
Large-scale epigenome mapping by the NIH Roadmap Epigenomics Project, the ENCODE Consortium and the International Human Epigenome Consortium (IHEC) produces genome-wide DNA methylation data at one base-pair resolution. We examine how such data can be made open-access while balancing appropriate interpretation and genomic privacy. We propose guidelines for data release that both reduce ambiguity in the interpretation of open-access data and limit immediate access to genetic variation data that are made available through controlled access.
Collapse
Affiliation(s)
- Stephanie O M Dyke
- Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, QC, H3A 0G1, Canada.
| | - Warren A Cheung
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, Montreal, QC, H3A 0G1, Canada
| | - Yann Joly
- Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, QC, H3A 0G1, Canada
| | - Ole Ammerpohl
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Campus Kiel & Christian-Albrechts-University Kiel, 24105, Kiel, Germany
| | - Pavlo Lutsik
- Saarland University, 66123, Saarbrücken, Germany
| | - Mark A Rothstein
- Institute for Bioethics, Health Policy and Law, University of Louisville School of Medicine, Louisville, KY, 40202, USA
| | - Maxime Caron
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, Montreal, QC, H3A 0G1, Canada
| | - Stephan Busche
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, Montreal, QC, H3A 0G1, Canada
| | - Guillaume Bourque
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, Montreal, QC, H3A 0G1, Canada
| | - Lars Rönnblom
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, SE-751 85, Uppsala, Sweden
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stephan Beck
- Medical Genomics, UCL Cancer Institute, University College London, London, WC1E 6BT, UK
| | - Martin Hirst
- Centre for High-Throughput Biology, University of British Columbia and Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Henk Stunnenberg
- Department of Molecular Biology, RIMLS, Faculty of Science, Radboud University, 6500 HB, Nijmegen, The Netherlands
| | - Reiner Siebert
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Campus Kiel & Christian-Albrechts-University Kiel, 24105, Kiel, Germany
| | - Jörn Walter
- Saarland University, 66123, Saarbrücken, Germany
| | - Tomi Pastinen
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, Montreal, QC, H3A 0G1, Canada.
| |
Collapse
|
43
|
Meeker D, Jiang X, Matheny ME, Farcas C, D'Arcy M, Pearlman L, Nookala L, Day ME, Kim KK, Kim H, Boxwala A, El-Kareh R, Kuo GM, Resnic FS, Kesselman C, Ohno-Machado L. A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research. J Am Med Inform Assoc 2015; 22:1187-95. [PMID: 26142423 PMCID: PMC4639714 DOI: 10.1093/jamia/ocv017] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 02/18/2015] [Indexed: 11/29/2022] Open
Abstract
Background Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner. Objective The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies. Materials and Methods Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network. Results The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws. Discussion and Conclusion Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks.
Collapse
Affiliation(s)
- Daniella Meeker
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA Information Sciences Institute, University of Southern California, Marina Del Rey, CA
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Michael E Matheny
- Geriatrics Research, Education, and Clinical Care Service Department of Biomedical Informatics, Division of General Internal Medicine, Department of Biostatistics
| | - Claudiu Farcas
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Michel D'Arcy
- Information Sciences Institute, University of Southern California, Marina Del Rey, CA
| | - Laura Pearlman
- Information Sciences Institute, University of Southern California, Marina Del Rey, CA
| | | | - Michele E Day
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Katherine K Kim
- Department of Pathology and Laboratory Medicine and Department of Internal Medicine, University of California Davis, Sacramento, CA
| | - Hyeoneui Kim
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Aziz Boxwala
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Robert El-Kareh
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| | - Grace M Kuo
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego
| | | | - Carl Kesselman
- Information Sciences Institute, University of Southern California, Marina Del Rey, CA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093
| |
Collapse
|
44
|
Belle A, Thiagarajan R, Soroushmehr SMR, Navidi F, Beard DA, Najarian K. Big Data Analytics in Healthcare. BIOMED RESEARCH INTERNATIONAL 2015; 2015:370194. [PMID: 26229957 PMCID: PMC4503556 DOI: 10.1155/2015/370194] [Citation(s) in RCA: 261] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Revised: 05/26/2015] [Accepted: 06/16/2015] [Indexed: 02/06/2023]
Abstract
The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined.
Collapse
Affiliation(s)
- Ashwin Belle
- Emergency Medicine Department, University of Michigan, Ann Arbor, MI 48109, USA
- University of Michigan Center for Integrative Research in Critical Care (MCIRCC), Ann Arbor, MI 48109, USA
| | - Raghuram Thiagarajan
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI 48109, USA
| | - S. M. Reza Soroushmehr
- Emergency Medicine Department, University of Michigan, Ann Arbor, MI 48109, USA
- University of Michigan Center for Integrative Research in Critical Care (MCIRCC), Ann Arbor, MI 48109, USA
| | - Fatemeh Navidi
- Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109, USA
| | - Daniel A. Beard
- University of Michigan Center for Integrative Research in Critical Care (MCIRCC), Ann Arbor, MI 48109, USA
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Kayvan Najarian
- Emergency Medicine Department, University of Michigan, Ann Arbor, MI 48109, USA
- University of Michigan Center for Integrative Research in Critical Care (MCIRCC), Ann Arbor, MI 48109, USA
| |
Collapse
|
45
|
Toga AW, Dinov ID. Sharing big biomedical data. JOURNAL OF BIG DATA 2015; 2:7. [PMID: 26929900 PMCID: PMC4768816 DOI: 10.1186/s40537-015-0016-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 04/28/2015] [Indexed: 06/05/2023]
Abstract
BACKGROUND The promise of Big Biomedical Data may be offset by the enormous challenges in handling, analyzing, and sharing it. In this paper, we provide a framework for developing practical and reasonable data sharing policies that incorporate the sociological, financial, technical and scientific requirements of a sustainable Big Data dependent scientific community. FINDINGS Many biomedical and healthcare studies may be significantly impacted by using large, heterogeneous and incongruent datasets; however there are significant technical, social, regulatory, and institutional barriers that need to be overcome to ensure the power of Big Data overcomes these detrimental factors. CONCLUSIONS Pragmatic policies that demand extensive sharing of data, promotion of data fusion, provenance, interoperability and balance security and protection of personal information are critical for the long term impact of translational Big Data analytics.
Collapse
Affiliation(s)
- Arthur W Toga
- />Laboratory of Neuro Imaging, Institute of Neuroimaging and Informatics, Keck School of Medicine of USC, University of Sothern California, 2001 North Soto Street-Room 102, Los Angeles, CA 90033 USA
| | - Ivo D Dinov
- />Statistics Online Computaitonal Resource, University of Michigan, UMSN, 400 North Ingalls, Room 4341, Ann Arbor, 48109-5482 MI USA
| |
Collapse
|
46
|
Gallego B, Walter SR, Day RO, Dunn AG, Sivaraman V, Shah N, Longhurst CA, Coiera E. Bringing cohort studies to the bedside: framework for a ‘green button’ to support clinical decision-making. J Comp Eff Res 2015; 4:191-197. [DOI: 10.2217/cer.15.12] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
When providing care, clinicians are expected to take note of clinical practice guidelines, which offer recommendations based on the available evidence. However, guidelines may not apply to individual patients with comorbidities, as they are typically excluded from clinical trials. Guidelines also tend not to provide relevant evidence on risks, secondary effects and long-term outcomes. Querying the electronic health records of similar patients may for many provide an alternate source of evidence to inform decision-making. It is important to develop methods to support these personalized observational studies at the point-of-care, to understand when these methods may provide valid results, and to validate and integrate these findings with those from clinical trials.
Collapse
Affiliation(s)
- Blanca Gallego
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, NSW 2109, Australia
| | - Scott R Walter
- Centre for Health Systems & Safety Research, Australian Institute of Health Innovation, Macquarie University, Australia
| | - Richard O Day
- St Vincent's Clinical School, University of New South Wales, St Vincent's Hospital, Sydney, Australia
| | - Adam G Dunn
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, NSW 2109, Australia
| | - Vijay Sivaraman
- Electrical Engineering & Telecommunications, University of New South Wales, Sydney, Australia
| | - Nigam Shah
- Biomedical Informatics Research, Stanford School of Medicine, CA 94305-5479, USA
| | | | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, NSW 2109, Australia
| |
Collapse
|
47
|
Medina García R, Torres Serrano E, Segrelles Quilis JD, Blanquer Espert I, Martí Bonmatí L, Almenar Cubells D. A systematic approach for using DICOM structured reports in clinical processes: focus on breast cancer. J Digit Imaging 2015; 28:132-45. [PMID: 25200428 PMCID: PMC4359202 DOI: 10.1007/s10278-014-9728-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
This paper describes a methodology for redesigning the clinical processes to manage diagnosis, follow-up, and response to treatment episodes of breast cancer. This methodology includes three fundamental elements: (1) identification of similar and contrasting cases that may be of clinical relevance based upon a target study, (2) codification of reports with standard medical terminologies, and (3) linking and indexing the structured reports obtained with different techniques in a common system. The combination of these elements should lead to improvements in the clinical management of breast cancer patients. The motivation for this work is the adaptation of the clinical processes for breast cancer created by the Valencian Community health authorities to the new techniques available for data processing. To achieve this adaptation, it was necessary to design nine Digital Imaging and Communications in Medicine (DICOM) structured report templates: six diagnosis templates and three summary templates that combine reports from clinical episodes. A prototype system is also described that links the lesion to the reports. Preliminary tests of the prototype have shown that the interoperability among the report templates allows correlating parameters from different reports. Further work is in progress to improve the methodology in order that it can be applied to clinical practice.
Collapse
Affiliation(s)
| | - Erik Torres Serrano
- />Institute for Molecular Imaging Technologies (I3M), Universitat Politècnica de València (UPVLC), Camino de Vera S/N, 46022 Valencia, Spain
| | | | | | - Luis Martí Bonmatí
- />Medical Imaging Unit, University and Polytechnic Hospital La Fe, Valencia, Spain
| | | |
Collapse
|
48
|
An electronic medical record system with treatment recommendations based on patient similarity. J Med Syst 2015; 39:55. [PMID: 25762458 DOI: 10.1007/s10916-015-0237-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2014] [Accepted: 03/02/2015] [Indexed: 10/23/2022]
Abstract
As the core of health information technology (HIT), electronic medical record (EMR) systems have been changing to meet health care demands. To construct a new-generation EMR system framework with the capability of self-learning and real-time feedback, thus adding intelligence to the EMR system itself, this paper proposed a novel EMR system framework by constructing a direct pathway between the EMR workflow and EMR data. A prototype of this framework was implemented based on patient similarity learning. Patient diagnoses, demographic data, vital signs and structured lab test results were considered for similarity calculations. Real hospitalization data from 12,818 patients were substituted, and Precision @ Position measurements were used to validate self-learning performance. Our EMR system changed the way in which orders are placed by establishing recommendation order menu and shortcut applications. Two learning modes (EASY MODE and COMPLEX MODE) were provided, and the precision values @ position 5 of both modes were 0.7458 and 0.8792, respectively. The precision performance of COMPLEX MODE was better than that of EASY MODE (tested using a paired Wilcoxon-Mann-Whitney test, p < 0.001). Applying the proposed framework, the EMR data value was directly demonstrated in the clinical workflow, and intelligence was added to the EMR system, which could improve system usability, reliability and the physician's work efficiency. This self-learning mechanism is based on dynamic learning models and is not limited to a specific disease or clinical scenario, thus decreasing maintenance costs in real world applications and increasing its adaptability.
Collapse
|
49
|
Doan S, Conway M, Phuong TM, Ohno-Machado L. Natural language processing in biomedicine: a unified system architecture overview. Methods Mol Biol 2015; 1168:275-94. [PMID: 24870142 DOI: 10.1007/978-1-4939-0847-9_16] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In contemporary electronic medical records much of the clinically important data-signs and symptoms, symptom severity, disease status, etc.-are not provided in structured data fields but rather are encoded in clinician-generated narrative text. Natural language processing (NLP) provides a means of unlocking this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we review briefly. Additionally, the challenge facing current research efforts in biomedical NLP includes the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge.
Collapse
Affiliation(s)
- Son Doan
- Division of Biomedical Informatics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, USA,
| | | | | | | |
Collapse
|
50
|
Jiang X, Zhao Y, Wang X, Malin B, Wang S, Ohno-Machado L, Tang H. A community assessment of privacy preserving techniques for human genomes. BMC Med Inform Decis Mak 2014; 14 Suppl 1:S1. [PMID: 25521230 PMCID: PMC4290799 DOI: 10.1186/1472-6947-14-s1-s1] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
To answer the need for the rigorous protection of biomedical data, we organized the Critical Assessment of Data Privacy and Protection initiative as a community effort to evaluate privacy-preserving dissemination techniques for biomedical data. We focused on the challenge of sharing aggregate human genomic data (e.g., allele frequencies) in a way that preserves the privacy of the data donors, without undermining the utility of genome-wide association studies (GWAS) or impeding their dissemination. Specifically, we designed two problems for disseminating the raw data and the analysis outcome, respectively, based on publicly available data from HapMap and from the Personal Genome Project. A total of six teams participated in the challenges. The final results were presented at a workshop of the iDASH (integrating Data for Analysis, 'anonymization,' and SHaring) National Center for Biomedical Computing. We report the results of the challenge and our findings about the current genome privacy protection techniques.
Collapse
|