1
|
McMurry AJ, Gottlieb DI, Miller TA, Jones JR, Atreja A, Crago J, Desai PM, Dixon BE, Garber M, Ignatov V, Kirchner LA, Payne PRO, Saldanha AJ, Shankar PRV, Solad YV, Sprouse EA, Terry M, Wilcox AB, Mandl KD. Cumulus: a federated electronic health record-based learning system powered by Fast Healthcare Interoperability Resources and artificial intelligence. J Am Med Inform Assoc 2024; 31:1638-1647. [PMID: 38860521 PMCID: PMC11258401 DOI: 10.1093/jamia/ocae130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/07/2024] [Accepted: 05/23/2024] [Indexed: 06/12/2024] Open
Abstract
OBJECTIVE To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app "listener" that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). METHODS We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and artificial intelligence (AI) for processing unstructured text. RESULTS Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across 5 healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. DISCUSSION AND CONCLUSION Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs, (2) increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.
Collapse
Affiliation(s)
- Andrew J McMurry
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States
| | - Daniel I Gottlieb
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States
| | - James R Jones
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, United States
| | - Ashish Atreja
- Innovation Technology, UC Davis Health, Rancho Cordova, CA 95670, United States
| | - Jennifer Crago
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN 46202, United States
| | - Pankaja M Desai
- Department of Internal Medicine, Rush University Medical Center, Chicago, IL 60612, United States
| | - Brian E Dixon
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN 46202, United States
- Department of Health Policy and Management, Fairbanks School of Public Health, Indiana University, Indianapolis, IN 46202, United States
| | - Matthew Garber
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, United States
| | - Vladimir Ignatov
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, United States
| | | | - Philip R O Payne
- Institute for Informatics, Data Science, and Biostatistics, Washington University School of Medicine in St Louis, St Louis, MO 63110, United States
- Department of Medicine, Washington University School of Medicine in St Louis, St Louis, MO 63110, United States
| | - Anil J Saldanha
- Department of Health Innovation, Rush University Medical Center, Chicago, IL 60612, United States
| | - Prabhu R V Shankar
- Innovation Technology, UC Davis Health, Rancho Cordova, CA 95670, United States
- Department of Public Health Sciences, UC Davis Health, Davis, CA 95817, United States
| | - Yauheni V Solad
- Innovation Technology, UC Davis Health, Rancho Cordova, CA 95670, United States
| | | | - Michael Terry
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, United States
| | - Adam B Wilcox
- Institute for Informatics, Data Science, and Biostatistics, Washington University School of Medicine in St Louis, St Louis, MO 63110, United States
- Department of Medicine, Washington University School of Medicine in St Louis, St Louis, MO 63110, United States
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
2
|
Kong SW, Lee IH, Collen LV, Manrai AK, Snapper SB, Mandl KD. Discordance between a deep learning model and clinical-grade variant pathogenicity classification in a rare disease cohort. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.22.24307756. [PMID: 38826236 PMCID: PMC11142383 DOI: 10.1101/2024.05.22.24307756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Genetic testing has become an essential component in the diagnosis and management of a wide range of clinical conditions, from cancer to developmental disorders, especially in rare Mendelian diseases. Efforts to identify rare phenotype-associated variants have predominantly focused on protein-truncating variants, while the interpretation of missense variants presents a considerable challenge. Deep learning algorithms excel in various applications across biomedical tasks1,2, yet accurately distinguishing between pathogenic and benign genetic variants remains an elusive goal3-5. Specifically, even the most sophisticated models encounter difficulties in accurately assessing the pathogenicity of missense variants of uncertain significance (VUS). Our investigation of AlphaMissense (AM)5, the latest iteration of deep learning methods for predicting the potential functional impact of missense variants and assessing gene essentiality, reveals important limitations in its ability to identify pathogenic missense variants within a rare disease cohort. Indeed, AM struggles to accurately assess the pathogenicity of variants in intrinsically disordered regions (IDRs), leading to unreliable gene-level essentiality scores for certain genes containing IDRs. This limitation highlights the challenges in applying AM faces in the context of clinical genetics6.
Collapse
Affiliation(s)
- Sek Won Kong
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - In-Hee Lee
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215
| | - Lauren V. Collen
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115
- Division of Gastroenterology, Hepatology, and Nutrition, Boston Children’s Hospital, Boston, MA 02215
| | - Arjun K. Manrai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
| | - Scott B. Snapper
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115
- Division of Gastroenterology, Hepatology, and Nutrition, Boston Children’s Hospital, Boston, MA 02215
| | - Kenneth D. Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
3
|
McMurry AJ, Gottlieb DI, Miller TA, Jones JR, Atreja A, Crago J, Desai PM, Dixon BE, Garber M, Ignatov V, Kirchner LA, Payne PRO, Saldanha AJ, Shankar PRV, Solad YV, Sprouse EA, Terry M, Wilcox AB, Mandl KD. Cumulus: A federated EHR-based learning system powered by FHIR and AI. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.02.24301940. [PMID: 38370642 PMCID: PMC10871375 DOI: 10.1101/2024.02.02.24301940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Objective To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app 'listener' that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). Methods We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and AI for processing unstructured text. Results Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across five healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. Discussion and Conclusion Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs (2), increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.
Collapse
Affiliation(s)
- Andrew J. McMurry
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- Department of Pediatrics, Harvard Medical School, Boston, MA
| | - Daniel I. Gottlieb
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Timothy A. Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- Department of Pediatrics, Harvard Medical School, Boston, MA
| | - James R. Jones
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
| | - Ashish Atreja
- Department of Health Information Technology, UC Davis Health, Rancho Cordova, CA
| | - Jennifer Crago
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN
| | - Pankaja M. Desai
- Department of Internal Medicine, Rush University Medical Center, Chicago IL
| | - Brian E. Dixon
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN
- Department of Health Policy and Management, Fairbanks School of Public Health, Indiana University, Indianapolis, IN
| | - Matthew Garber
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
| | - Vladimir Ignatov
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
| | | | - Philip R. O. Payne
- Department of Medicine, Washington University in St. Louis, St. Louis, MO
| | - Anil J. Saldanha
- Department of Health Innovation, Rush University Medical Center, Chicago, IL
| | - Prabhu R. V. Shankar
- Department of Health Information Technology, UC Davis Health, Rancho Cordova, CA
- Department of Public Health Sciences, UC Davis Health, Davis , CA
| | - Yauheni V. Solad
- Department of Health Information Technology, UC Davis Health, Rancho Cordova, CA
| | | | - Michael Terry
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
| | - Adam B. Wilcox
- Department of Medicine, Washington University in St. Louis, St. Louis, MO
| | - Kenneth D. Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| |
Collapse
|
4
|
Casaletto J, Bernier A, McDougall R, Cline MS. Federated Analysis for Privacy-Preserving Data Sharing: A Technical and Legal Primer. Annu Rev Genomics Hum Genet 2023; 24:347-368. [PMID: 37253596 PMCID: PMC10846631 DOI: 10.1146/annurev-genom-110122-084756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Continued advances in precision medicine rely on the widespread sharing of data that relate human genetic variation to disease. However, data sharing is severely limited by legal, regulatory, and ethical restrictions that safeguard patient privacy. Federated analysis addresses this problem by transferring the code to the data-providing the technical and legal capability to analyze the data within their secure home environment rather than transferring the data to another institution for analysis. This allows researchers to gain new insights from data that cannot be moved, while respecting patient privacy and the data stewards' legal obligations. Because federated analysis is a technical solution to the legal challenges inherent in data sharing, the technology and policy implications must be evaluated together. Here, we summarize the technical approaches to federated analysis and provide a legal analysis of their policy implications.
Collapse
Affiliation(s)
- James Casaletto
- Genomics Institute, University of California, Santa Cruz, California, USA; ,
| | - Alexander Bernier
- Centre of Genomics and Policy, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada; ,
| | - Robyn McDougall
- Centre of Genomics and Policy, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada; ,
| | - Melissa S Cline
- Genomics Institute, University of California, Santa Cruz, California, USA; ,
| |
Collapse
|
5
|
Cheng TL, Glauser TA, Reed A. The evolving model of pediatric research. Pediatr Res 2023:10.1038/s41390-023-02677-0. [PMID: 37400540 DOI: 10.1038/s41390-023-02677-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/12/2023] [Indexed: 07/05/2023]
Affiliation(s)
- Tina L Cheng
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
| | - Tracy A Glauser
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Ann Reed
- Department of Pediatrics, Duke University, Durham, NC, USA
| |
Collapse
|
6
|
Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng 2023; 7:719-742. [PMID: 37380750 PMCID: PMC10632090 DOI: 10.1038/s41551-023-01056-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/13/2023] [Indexed: 06/30/2023]
Abstract
In healthcare, the development and deployment of insufficiently fair systems of artificial intelligence (AI) can undermine the delivery of equitable care. Assessments of AI models stratified across subpopulations have revealed inequalities in how patients are diagnosed, treated and billed. In this Perspective, we outline fairness in machine learning through the lens of healthcare, and discuss how algorithmic biases (in data acquisition, genetic variation and intra-observer labelling variability, in particular) arise in clinical workflows and the resulting healthcare disparities. We also review emerging technology for mitigating biases via disentanglement, federated learning and model explainability, and their role in the development of AI-based software as a medical device.
Collapse
Affiliation(s)
- Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Judy J Wang
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Boston University School of Medicine, Boston, MA, USA
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sharifa Sahai
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA.
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
7
|
Lin Y, Wang H, Li W, Shen J. Federated learning with hyper-network-a case study on whole slide image analysis. Sci Rep 2023; 13:1724. [PMID: 36720907 PMCID: PMC9889400 DOI: 10.1038/s41598-023-28974-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/27/2023] [Indexed: 02/01/2023] Open
Abstract
Federated learning(FL) is a new kind of Artificial Intelligence(AI) aimed at data privacy preservation that builds on decentralizing the training data for the deep learning model. This new technique of data security and privacy sheds light on many critical domains with highly sensitive data, including medical image analysis. Developing a strong, scalable, and precise deep learning model has proven to count on a variety of high-quality data from different centers. However, data holders may not willing to share their data considering the restriction of privacy. In this paper, we approach this challenge with a federated learning paradigm. Specifically, we present a case study on the whole slide image classification problem. At each local client center, a multiple-instance learning classifier is developed to conduct whole slide image classification. We introduce a privacy-preserving federated learning framework based on hyper-network to update the global model. Hyper-network is deployed at the global center that produces the weights of the local network conditioned on its input. In this way, hyper-networks can simultaneously learn a family of the local client networks. Instead of communicating raw data with the local client, only model parameters injected with noise are transferred between the local client and the global model. By using a large scale of whole slide images with only slide-level labels, we mensurated our way on two different whole slide image classification problems. The results demonstrate that our proposed federated learning model based on hyper-network can effectively leverage multi-center data to develop a more accurate model which can be used to classify a whole slide image. Its improvements in terms of over the isolated local centers and the commonly used federated averaging baseline are significant. Code will be available.
Collapse
Affiliation(s)
- Yanfei Lin
- China Telecom Research Institute, Guangzhou, 510000 China
| | - Haiyi Wang
- China Telecom Research Institute, Guangzhou, 510000 China
| | - Weichen Li
- China Telecom Research Institute, Guangzhou, 510000 China
| | - Jun Shen
- China Telecom Research Institute, Guangzhou, 510000 China
| |
Collapse
|
8
|
Improving child health through Big Data and data science. Pediatr Res 2023; 93:342-349. [PMID: 35974162 PMCID: PMC9380977 DOI: 10.1038/s41390-022-02264-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/10/2022] [Accepted: 06/28/2022] [Indexed: 12/04/2022]
Abstract
Child health is defined by a complex, dynamic network of genetic, cultural, nutritional, infectious, and environmental determinants at distinct, developmentally determined epochs from preconception to adolescence. This network shapes the future of children, susceptibilities to adult diseases, and individual child health outcomes. Evolution selects characteristics during fetal life, infancy, childhood, and adolescence that adapt to predictable and unpredictable exposures/stresses by creating alternative developmental phenotype trajectories. While child health has improved in the United States and globally over the past 30 years, continued improvement requires access to data that fully represent the complexity of these interactions and to new analytic methods. Big Data and innovative data science methods provide tools to integrate multiple data dimensions for description of best clinical, predictive, and preventive practices, for reducing racial disparities in child health outcomes, for inclusion of patient and family input in medical assessments, and for defining individual disease risk, mechanisms, and therapies. However, leveraging these resources will require new strategies that intentionally address institutional, ethical, regulatory, cultural, technical, and systemic barriers as well as developing partnerships with children and families from diverse backgrounds that acknowledge historical sources of mistrust. We highlight existing pediatric Big Data initiatives and identify areas of future research. IMPACT: Big Data and data science can improve child health. This review highlights the importance for child health of child-specific and life course-based Big Data and data science strategies. This review provides recommendations for future pediatric-specific Big Data and data science research.
Collapse
|
9
|
Khanna A, Schaffer V, Gursoy G, Gerstein M. Privacy-preserving Model Training for Disease Prediction Using Federated Learning with Differential Privacy. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1358-1361. [PMID: 36086138 DOI: 10.1109/embc48229.2022.9871742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Machine learning is playing an increasingly critical role in health science with its capability of inferring valuable information from high-dimensional data. More training data provides greater statistical power to generate better models that can help decision-making in healthcare. However, this often requires combining research and patient data across institutions and hospitals, which is not always possible due to privacy considerations. In this paper, we outline a simple federated learning algorithm implementing differential privacy to ensure privacy when training a machine learning model on data spread across different institutions. We tested our model by predicting breast cancer status from gene expression data. Our model achieves a similar level of accuracy and precision as a single-site non-private neural network model when we enforce privacy. This result suggests that our algorithm is an effective method of implementing differential privacy with federated learning, and clinical data scientists can use our general framework to produce differentially private models on federated datasets. Our framework is available at https://github.com/gersteinlab/idash20FL.
Collapse
Affiliation(s)
- Amol Khanna
- Johns Hopkins University,Department of Biomedical Engineering, Department of Applied Mathematics and Statistics,Baltimore,MD,USA,21218
| | - Vincent Schaffer
- Yale University,Department of Computer Science,New Haven,CT,USA,06520
| | - Gamze Gursoy
- Columbia University,Department of Biomedical Informatics,New York,NY,USA,10032
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University,Department of Molecular Biophysics and Biochemistry, Department of Computer Science, Department of Statistics and Data Science,New Haven,CT,USA,06250
| |
Collapse
|
10
|
Lippa N, Bier L, Revah-Politi A, May H, Kushary S, Vena N, Giordano JL, Rasouly HM, Cocchi E, Sands TT, Wapner RJ, Anyane-Yeboa K, Gharavi AG, Goldstein DB. Diagnostic sequencing to support genetically stratified medicine in a tertiary care setting. Genet Med 2022; 24:862-869. [PMID: 35078725 DOI: 10.1016/j.gim.2021.12.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 12/14/2021] [Accepted: 12/16/2021] [Indexed: 11/27/2022] Open
Abstract
PURPOSE The goal of stratified medicine is to identify subgroups of patients with similar disease mechanisms and specific responses to treatments. To prepare for stratified clinical trials, genome-wide genetic analysis should occur across clinical areas to identify undiagnosed genetic diseases and new genetic causes of disease. METHODS To advance genetically stratified medicine, we have developed and implemented broad exome sequencing infrastructure and research protocols at Columbia University Irving Medical Center/NewYork-Presbyterian Hospital. RESULTS We enrolled 4889 adult and pediatric probands and identified a primary result in 572 probands. The cohort was phenotypically and demographically heterogeneous because enrollment occurred across multiple specialty clinics (eg, epilepsy, nephrology, fetal anomaly). New gene-disease associations and phenotypic expansions were discovered across clinical specialties. CONCLUSION Our study processes have enabled the enrollment and exome sequencing/analysis of a phenotypically and demographically diverse cohort of patients within 1 tertiary care medical center. Because all genomic data are stored centrally with permission for longitudinal access to the electronic medical record, subjects can be recontacted with updated genetic diagnoses or for participation in future genotype-based clinical trials. This infrastructure has allowed for the promotion of genetically stratified clinical trial readiness within the Columbia University Irving Medical Center/NewYork-Presbyterian Hospital health care system.
Collapse
Affiliation(s)
- Natalie Lippa
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY
| | - Louise Bier
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY
| | - Anya Revah-Politi
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY; Precision Genomics Laboratory, Columbia University Irving Medical Center, New York, NY
| | - Halie May
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY
| | - Sulagna Kushary
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY
| | - Natalie Vena
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY; Division of Nephrology, Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Jessica L Giordano
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY
| | - Hila Milo Rasouly
- Division of Nephrology, Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Enrico Cocchi
- Division of Nephrology, Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Tristan T Sands
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY; Division of Child Neurology, Department of Neurology, Columbia University Irving Medical Center, New York, NY
| | - Ronald J Wapner
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY; Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY
| | - Kwame Anyane-Yeboa
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY; Division of Clinical Genetics, Department of Pediatrics, Columbia University Irving Medical Center, New York, NY
| | - Ali G Gharavi
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY; Division of Nephrology, Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - David B Goldstein
- Institiute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY.
| |
Collapse
|
11
|
Functional genomics data: privacy risk assessment and technological mitigation. Nat Rev Genet 2022; 23:245-258. [PMID: 34759381 DOI: 10.1038/s41576-021-00428-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2021] [Indexed: 12/15/2022]
Abstract
The generation of functional genomics data by next-generation sequencing has increased greatly in the past decade. Broad sharing of these data is essential for research advancement but poses notable privacy challenges, some of which are analogous to those that occur when sharing genetic variant data. However, there are also unique privacy challenges that arise from cryptic information leakage during the processing and summarization of functional genomics data from raw reads to derived quantities, such as gene expression values. Here, we review these challenges and present potential solutions for mitigating privacy risks while allowing broad data dissemination and analysis.
Collapse
|
12
|
Lu MY, Chen RJ, Kong D, Lipkova J, Singh R, Williamson DFK, Chen TY, Mahmood F. Federated learning for computational pathology on gigapixel whole slide images. Med Image Anal 2022; 76:102298. [PMID: 34911013 PMCID: PMC9340569 DOI: 10.1016/j.media.2021.102298] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 10/07/2021] [Accepted: 11/02/2021] [Indexed: 02/07/2023]
Abstract
Deep Learning-based computational pathology algorithms have demonstrated profound ability to excel in a wide array of tasks that range from characterization of well known morphological phenotypes to predicting non human-identifiable features from histology such as molecular alterations. However, the development of robust, adaptable and accurate deep learning-based models often rely on the collection and time-costly curation large high-quality annotated training data that should ideally come from diverse sources and patient populations to cater for the heterogeneity that exists in such datasets. Multi-centric and collaborative integration of medical data across multiple institutions can naturally help overcome this challenge and boost the model performance but is limited by privacy concerns among other difficulties that may arise in the complex data sharing process as models scale towards using hundreds of thousands of gigapixel whole slide images. In this paper, we introduce privacy-preserving federated learning for gigapixel whole slide images in computational pathology using weakly-supervised attention multiple instance learning and differential privacy. We evaluated our approach on two different diagnostic problems using thousands of histology whole slide images with only slide-level labels. Additionally, we present a weakly-supervised learning framework for survival prediction and patient stratification from whole slide images and demonstrate its effectiveness in a federated setting. Our results show that using federated learning, we can effectively develop accurate weakly-supervised deep learning models from distributed data silos without direct data sharing and its associated complexities, while also preserving differential privacy using randomized noise generation. We also make available an easy-to-use federated learning for computational pathology software package: http://github.com/mahmoodlab/HistoFL.
Collapse
Affiliation(s)
- Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Dehan Kong
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Rajendra Singh
- Department of Pathology, Northwell Health, NY, United States
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, United States; Data Science Department, Dana-Farber/Harvard Cancer Center, Boston, MA, United States; Harvard Data Science Initiative, Harvard University, Cambridge, MA, United States.
| |
Collapse
|
13
|
Abstract
Research and development are facilitated by sharing knowledge bases, and the innovation process benefits from collaborative efforts that involve the collective utilization of data. Until now, most companies and organizations have produced and collected various types of data, and stored them in data silos that still have to be integrated with one another in order to enable knowledge creation. For this to happen, both public and private actors must adopt a flexible approach to achieve the necessary transition to break data silos and create collaborative data sharing between data producers and users. In this paper, we investigate several factors influencing cooperative data usage and explore the challenges posed by the participation in cross-organizational data ecosystems by performing an interview study among stakeholders from private and public organizations in the context of the project IDE@S, which aims at fostering the cooperation in data science in the Austrian federal state of Styria. We highlight technological and organizational requirements of data infrastructure, expertise, and practises towards collaborative data usage.
Collapse
|
14
|
Pritchard D, Goodman C, Nadauld LD. Clinical Utility of Genomic Testing in Cancer Care. JCO Precis Oncol 2022; 6:e2100349. [PMID: 35085005 PMCID: PMC8830511 DOI: 10.1200/po.21.00349] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
In the era of personalized medicine, physicians rely on their understanding of clinical utility to assess the value of rapidly evolving genetic and genomic tests. Current definitions of the clinical utility of genetic testing sufficiently capture a range of benefits and risks that derive from positive and negative results of tests that assess one gene or a few genes. However, these definitions of clinical utility are inadequate to recognize the wider scope of benefits that accrue from more comprehensive genomic tests, which can develop data sets that inform clinical decision making as well as population health and scientific advancement in novel ways.
Collapse
Affiliation(s)
| | | | - Lincoln D Nadauld
- Precision Health and Genomics, Intermountain Healthcare, Salt Lake City, UT
| |
Collapse
|
15
|
Infrastructuring an organizational node for a federated research and data network: A case study from a sociotechnical perspective. J Clin Transl Sci 2021; 5:e186. [PMID: 34849261 PMCID: PMC8596061 DOI: 10.1017/cts.2021.846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 08/13/2021] [Accepted: 08/16/2021] [Indexed: 11/24/2022] Open
Abstract
Background: Local nodes on federated research and data networks (FR&DNs) provide enabling infrastructure for collaborative clinical and translational research. Studies in other fields note that infrastructuring, that is, work to identify and negotiate relationships among people, technologies, and organizations, is invisible, unplanned, and undervalued. This may explain the limited literature on nodes in FR&DNs in health care. Methods: A retrospective case study of one PCORnet® node explored 3 questions: (1) how were components of infrastructure assembled; (2) what specific work was required; and (3) what theoretically grounded, pragmatic questions should be considered when infrastructuring a node for sustainability. Artifacts, work efforts, and interviews generated during node development and implementation were reviewed. A sociotechnical lens was applied to the analysis. Validity was established with internal and external partners. Results: Resources, services, and expertise needed to establish the node existed within the organization, but were scattered across work units. Aligning, mediating, and institutionalizing for sustainability among network and organizational teams, governance, and priorities consumed more work efforts than deploying technical aspects of the node. A theoretically based set of questions relevant to infrastructuring a node was developed and organized within a framework of infrastructuring emphasizing enacting technology, organizing work, and institutionalizing; validity was established with internal and external partners. Conclusions: FR&DNs are expanding; we provide a sociotechnical perspective on infrastructuring a node. Future research should evaluate the applicability of the framework and questions to other node and network configurations, and more broadly the infrastructuring required to enable and support federated clinical and translational science.
Collapse
|
16
|
Antoniades A, Papaioannou M, Malatras A, Papagregoriou G, Müller H, Holub P, Deltas C, Schizas CN. Integration of Biobanks in National eHealth Ecosystems Facilitating Long-Term Longitudinal Clinical-Omics Studies and Citizens' Engagement in Research Through eHealthBioR. Front Digit Health 2021; 3:628646. [PMID: 34713101 PMCID: PMC8521893 DOI: 10.3389/fdgth.2021.628646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 05/11/2021] [Indexed: 11/13/2022] Open
Abstract
Biobanks have long existed to support research activities with BBMRI-ERIC formed as a European research infrastructure supporting the coordination for biobanking with 20 country members and one international organization. Although the benefits of biobanks to the research community are well-established, the direct benefit to citizens is limited to the generic benefit of promoting future research. Furthermore, the advent of General Data Protection Regulation (GDPR) legislation raised a series of challenges for scientific research especially related to biobanking associate activities and longitudinal research studies. Electronic health record (EHR) registries have long existed in healthcare providers. In some countries, even at the national level, these record the state of the health of citizens through time for the purposes of healthcare and data portability between different providers. The potential of EHRs in research is great and has been demonstrated in many projects that have transformed EHR data into retrospective medical history information on participating subjects directly from their physician's collected records; many key challenges, however, remain. In this paper, we present a citizen-centric framework called eHealthBioR, which would enable biobanks to link to EHR systems, thus enabling not just retrospective but also lifelong prospective longitudinal studies of participating citizens. It will also ensure strict adherence to legal and ethical requirements, enabling greater control that encourages participation. Citizens would benefit from the real and direct control of their data and samples, utilizing technology, to empower them to make informed decisions about providing consent and practicing their rights related to the use of their data, as well as by having access to knowledge and data generated from samples they provided to biobanks. This is expected to motivate patient engagement in future research and even leads to participatory design methodologies with citizen/patient-centric designed studies. The development of platforms based on the eHealthBioR framework would need to overcome significant challenges. However, it would shift the burden of addressing these to experts in the field while providing solutions enabling in the long term the lower monetary and time cost of longitudinal studies coupled with the option of lifelong monitoring through EHRs.
Collapse
Affiliation(s)
- Athos Antoniades
- eHealth Lab, Department of Computer Science, University of Cyprus, Nicosia, Cyprus
| | - Maria Papaioannou
- eHealth Lab, Department of Computer Science, University of Cyprus, Nicosia, Cyprus
| | - Apostolos Malatras
- biobank.cy Center of Excellence in Biobanking and Biomedical Research, University of Cyprus, Nicosia, Cyprus
| | - Gregory Papagregoriou
- biobank.cy Center of Excellence in Biobanking and Biomedical Research, University of Cyprus, Nicosia, Cyprus
| | - Heimo Müller
- Institute of Pathology, Medical University of Graz, Graz, Austria.,Biobanking and Biomolecular Resources Research Infrastructure - European Research Infrastructure Consortium, Biobanks and Biomolecular Resources Research Infrastructure Consortium, Graz, Austria
| | - Petr Holub
- Biobanking and Biomolecular Resources Research Infrastructure - European Research Infrastructure Consortium, Biobanks and Biomolecular Resources Research Infrastructure Consortium, Graz, Austria
| | - Constantinos Deltas
- biobank.cy Center of Excellence in Biobanking and Biomedical Research, University of Cyprus, Nicosia, Cyprus
| | - Christos N Schizas
- eHealth Lab, Department of Computer Science, University of Cyprus, Nicosia, Cyprus
| |
Collapse
|
17
|
Caenazzo L, Tozzo P. Microbiome Forensic Biobanking: A Step toward Microbial Profiling for Forensic Human Identification. Healthcare (Basel) 2021; 9:1371. [PMID: 34683051 PMCID: PMC8544459 DOI: 10.3390/healthcare9101371] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/04/2021] [Accepted: 10/11/2021] [Indexed: 11/16/2022] Open
Abstract
In recent years many studies have highlighted the great potential of microbial analysis in human identification for forensic purposes, with important differences in microbial community composition and function across different people and locations, showing a certain degree of uncertainty. Therefore, further studies are necessary to enable forensic scientists to evaluate the risk of microbial transfer and recovery from various items and to further critically evaluate the suitability of current human DNA recovery protocols for human microbial profiling for identification purposes. While the establishment and development of microbiome research biobanks for clinical applications is already very structured, the development of studies on the applicability of microbiome biobanks for forensic purposes is still in its infancy. The creation of large population microbiome biobanks, specifically dedicated to forensic human identification, could be worthwhile. This could also be useful to increase the practical applications of forensic microbiology for identification purposes, given that this type of evidence is currently absent from most real casework investigations and judicial proceedings in courts.
Collapse
Affiliation(s)
| | - Pamela Tozzo
- Laboratory of Forensic Genetics, Department of Molecular Medicine, University of Padova, 35121 Padova, Italy;
| |
Collapse
|
18
|
Visweswaran S, McLay B, Cappella N, Morris M, Milnes JT, Reis SE, Silverstein JC, Becich MJ. An atomic approach to the design and implementation of a research data warehouse. J Am Med Inform Assoc 2021; 29:601-608. [PMID: 34613409 PMCID: PMC8922189 DOI: 10.1093/jamia/ocab204] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 07/27/2021] [Accepted: 09/10/2021] [Indexed: 11/14/2022] Open
Abstract
Objective As a long-standing Clinical and Translational Science Awards (CTSA) Program hub, the University of Pittsburgh and the University of Pittsburgh Medical Center (UPMC) developed and implemented a modern research data warehouse (RDW) to efficiently provision electronic patient data for clinical and translational research. Materials and Methods We designed and implemented an RDW named Neptune to serve the specific needs of our CTSA. Neptune uses an atomic design where data are stored at a high level of granularity as represented in source systems. Neptune contains robust patient identity management tailored for research; integrates patient data from multiple sources, including electronic health records (EHRs), health plans, and research studies; and includes knowledge for mapping to standard terminologies. Results Neptune contains data for more than 5 million patients longitudinally organized as Health Insurance Portability and Accountability Act (HIPAA) Limited Data with dates and includes structured EHR data, clinical documents, health insurance claims, and research data. Neptune is used as a source for patient data for hundreds of institutional review board-approved research projects by local investigators and for national projects. Discussion The design of Neptune was heavily influenced by the large size of UPMC, the varied data sources, and the rich partnership between the University and the healthcare system. It includes several unique aspects, including the physical warehouse straddling the University and UPMC networks and management under an HIPAA Business Associates Agreement. Conclusion We describe the design and implementation of an RDW at a large academic healthcare system that uses a distinctive atomic design where data are stored at a high level of granularity.
Collapse
Affiliation(s)
- Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Brian McLay
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Nickie Cappella
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - John T Milnes
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Steven E Reis
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Chief Research Information Officer, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Michael J Becich
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
19
|
Kirienko M, Sollini M, Ninatti G, Loiacono D, Giacomello E, Gozzi N, Amigoni F, Mainardi L, Lanzi PL, Chiti A. Distributed learning: a reliable privacy-preserving strategy to change multicenter collaborations using AI. Eur J Nucl Med Mol Imaging 2021; 48:3791-3804. [PMID: 33847779 PMCID: PMC8041944 DOI: 10.1007/s00259-021-05339-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/24/2021] [Indexed: 12/12/2022]
Abstract
Purpose The present scoping review aims to assess the non-inferiority of distributed learning over centrally and locally trained machine learning (ML) models in medical applications. Methods We performed a literature search using the term “distributed learning” OR “federated learning” in the PubMed/MEDLINE and EMBASE databases. No start date limit was used, and the search was extended until July 21, 2020. We excluded articles outside the field of interest; guidelines or expert opinion, review articles and meta-analyses, editorials, letters or commentaries, and conference abstracts; articles not in the English language; and studies not using medical data. Selected studies were classified and analysed according to their aim(s). Results We included 26 papers aimed at predicting one or more outcomes: namely risk, diagnosis, prognosis, and treatment side effect/adverse drug reaction. Distributed learning was compared to centralized or localized training in 21/26 and 14/26 selected papers, respectively. Regardless of the aim, the type of input, the method, and the classifier, distributed learning performed close to centralized training, but two experiments focused on diagnosis. In all but 2 cases, distributed learning outperformed locally trained models. Conclusion Distributed learning resulted in a reliable strategy for model development; indeed, it performed equally to models trained on centralized datasets. Sensitive data can get preserved since they are not shared for model development. Distributed learning constitutes a promising solution for ML-based research and practice since large, diverse datasets are crucial for success.
Collapse
Affiliation(s)
- Margarita Kirienko
- Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy.,Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
| | - Martina Sollini
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy. .,IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy.
| | - Gaia Ninatti
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
| | | | | | - Noemi Gozzi
- IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
| | | | | | | | - Arturo Chiti
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy.,IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
| |
Collapse
|
20
|
Oocyte Biobanks: Old Assumptions and New Challenges. BIOTECH 2021; 10:biotech10010004. [PMID: 35822776 PMCID: PMC9245479 DOI: 10.3390/biotech10010004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/10/2021] [Accepted: 02/11/2021] [Indexed: 11/19/2022] Open
Abstract
The preservation of fertility is a clinical issue that has been emerging considerably in recent decades, as the number of patients of childbearing age who risk becoming infertile for many reasons is increasing. The cryopreservation technique of oocytes has been developed for many years and nowadays constitutes a method of safe storage with impressive efficacy and high rates of successful thawing. The storage and use for research of oocytes taken for medical or non-medical can be carried out by both public and private structures, through egg sharing, voluntary egg donation and so-called “social freezing” for autologous use. This paper focuses on the oocyte bank as an emerging cryopreservation facility, in which a collaboration between public and private and the creation of a network of these biobanks can be useful in enhancing both their implementation and their functions. Good oocyte biobank practice would require that they be collected, stored, and used according to appropriate bioethical and bio-law criteria, collected and stored according to procedures that guarantee the best preservation of their structural components and a high level of safety, connected with appropriate procedures to protect the rights and privacy of the parties involved and associated with the results of the bio-molecular investigations that will be carried out gradually.
Collapse
|
21
|
Gutiérrez-Sacristán A, De Niz C, Kothari C, Kong SW, Mandl KD, Avillach P. GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets. Brief Bioinform 2021; 22:55-65. [PMID: 32249310 PMCID: PMC7820848 DOI: 10.1093/bib/bbaa033] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 01/31/2020] [Indexed: 12/17/2022] Open
Abstract
Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient's individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine's main objective-ensuring the optimum diagnosis, treatment and prognosis for each individual-investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data-and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).
Collapse
Affiliation(s)
| | - Carlos De Niz
- Department of Biomedical Informatics, Harvard Medical School
| | - Cartik Kothari
- Department of Biomedical Informatics, Harvard Medical School
| | - Sek Won Kong
- Department of Biomedical Informatics, Harvard Medical School; Computational Health Informatics Program, Boston Children's Hospital
| | - Kenneth D Mandl
- Department of Biomedical Informatics, Harvard Medical School; Computational Health Informatics Program, Boston Children's Hospital
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School; Computational Health Informatics Program, Boston Children's Hospital
| |
Collapse
|
22
|
Sollini M, Bartoli F, Marciano A, Zanca R, Slart RHJA, Erba PA. Artificial intelligence and hybrid imaging: the best match for personalized medicine in oncology. Eur J Hybrid Imaging 2020; 4:24. [PMID: 34191197 PMCID: PMC8218106 DOI: 10.1186/s41824-020-00094-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 11/26/2020] [Indexed: 12/20/2022] Open
Abstract
Artificial intelligence (AI) refers to a field of computer science aimed to perform tasks typically requiring human intelligence. Currently, AI is recognized in the broader technology radar within the five key technologies which emerge for their wide-ranging applications and impact in communities, companies, business, and value chain framework alike. However, AI in medical imaging is at an early phase of development, and there are still hurdles to take related to reliability, user confidence, and adoption. The present narrative review aimed to provide an overview on AI-based approaches (distributed learning, statistical learning, computer-aided diagnosis and detection systems, fully automated image analysis tool, natural language processing) in oncological hybrid medical imaging with respect to clinical tasks (detection, contouring and segmentation, prediction of histology and tumor stage, prediction of mutational status and molecular therapies targets, prediction of treatment response, and outcome). Particularly, AI-based approaches have been briefly described according to their purpose and, finally lung cancer-being one of the most extensively malignancy studied by hybrid medical imaging-has been used as illustrative scenario. Finally, we discussed clinical challenges and open issues including ethics, validation strategies, effective data-sharing methods, regulatory hurdles, educational resources, and strategy to facilitate the interaction among different stakeholders. Some of the major changes in medical imaging will come from the application of AI to workflow and protocols, eventually resulting in improved patient management and quality of life. Overall, several time-consuming tasks could be automatized. Machine learning algorithms and neural networks will permit sophisticated analysis resulting not only in major improvements in disease characterization through imaging, but also in the integration of multiple-omics data (i.e., derived from pathology, genomic, proteomics, and demographics) for multi-dimensional disease featuring. Nevertheless, to accelerate the transition of the theory to practice a sustainable development plan considering the multi-dimensional interactions between professionals, technology, industry, markets, policy, culture, and civil society directed by a mindset which will allow talents to thrive is necessary.
Collapse
Affiliation(s)
- Martina Sollini
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele (Milan), Italy
- Humanitas Clinical and Research Center, Rozzano (Milan), Italy
| | - Francesco Bartoli
- Regional Center of Nuclear Medicine, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Andrea Marciano
- Regional Center of Nuclear Medicine, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Roberta Zanca
- Regional Center of Nuclear Medicine, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Riemer H J A Slart
- University Medical Center Groningen, Medical Imaging Center, University of Groningen, Groningen, The Netherlands
- Faculty of Science and Technology, Biomedical Photonic Imaging, University of Twente, Enschede, The Netherlands
| | - Paola A Erba
- Regional Center of Nuclear Medicine, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy.
- University Medical Center Groningen, Medical Imaging Center, University of Groningen, Groningen, The Netherlands.
| |
Collapse
|
23
|
Brat GA, Weber GM, Gehlenborg N, Avillach P, Palmer NP, Chiovato L, Cimino J, Waitman LR, Omenn GS, Malovini A, Moore JH, Beaulieu-Jones BK, Tibollo V, Murphy SN, Yi SL, Keller MS, Bellazzi R, Hanauer DA, Serret-Larmande A, Gutierrez-Sacristan A, Holmes JJ, Bell DS, Mandl KD, Follett RW, Klann JG, Murad DA, Scudeller L, Bucalo M, Kirchoff K, Craig J, Obeid J, Jouhet V, Griffier R, Cossin S, Moal B, Patel LP, Bellasi A, Prokosch HU, Kraska D, Sliz P, Tan ALM, Ngiam KY, Zambelli A, Mowery DL, Schiver E, Devkota B, Bradford RL, Daniar M, Daniel C, Benoit V, Bey R, Paris N, Serre P, Orlova N, Dubiel J, Hilka M, Jannot AS, Breant S, Leblanc J, Griffon N, Burgun A, Bernaux M, Sandrin A, Salamanca E, Cormont S, Ganslandt T, Gradinger T, Champ J, Boeker M, Martel P, Esteve L, Gramfort A, Grisel O, Leprovost D, Moreau T, Varoquaux G, Vie JJ, Wassermann D, Mensch A, Caucheteux C, Haverkamp C, Lemaitre G, Bosari S, Krantz ID, South A, Cai T, Kohane IS. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med 2020. [PMID: 32864472 DOI: 10.1101/2020.04.13.20059691v5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across five countries (www.covidclinical.net). Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.
Collapse
Affiliation(s)
- Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Nathan P Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Luca Chiovato
- IRCCS ICS Maugeri, Pavia, Italy.,Department of Internal Medicine and Medical Therapy, University of Pavia, Pavia, Italy
| | | | - Lemuel R Waitman
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, KS USA
| | - Gilbert S Omenn
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI USA
| | | | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA.,Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | | | | | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA USA
| | - Sehi L' Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Mark S Keller
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Riccardo Bellazzi
- IRCCS ICS Maugeri, Pavia, Italy.,Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - David A Hanauer
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI USA
| | | | | | - John J Holmes
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA.,Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI USA
| | - Douglas S Bell
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA USA
| | - Robert W Follett
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA USA
| | - Jeffrey G Klann
- Department of Medicine, Massachusetts General Hospital, Boston, MA USA
| | - Douglas A Murad
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA USA
| | - Luigia Scudeller
- Scientific Direction, IRCCS Ca' Granda Ospedale Maggiore Policlinico di Milano, Milano, Italy
| | - Mauro Bucalo
- BIOMERIS (BIOMedical Research Informatics Solutions), Pavia, Italy
| | - Katie Kirchoff
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC USA
| | - Jean Craig
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC USA
| | - Jihad Obeid
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC USA
| | | | | | | | | | - Lav P Patel
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, KS USA
| | - Antonio Bellasi
- UOC Ricerca, Innovazione e Brand Reputation, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Hans U Prokosch
- Department of Medical Informatics, University of Erlangen-Nürnberg, Erlangen, Germany
| | - Detlef Kraska
- Center for Medical Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany
| | - Piotr Sliz
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA USA
| | - Amelia L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Kee Yuan Ngiam
- National University Health Systems, Singapore, Singapore
| | - Alberto Zambelli
- Department of Oncology, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Danielle L Mowery
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA.,Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI USA
| | - Emily Schiver
- Penn Medicine, Data Analytics Center, Philadelphia, PA USA
| | - Batsal Devkota
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA USA
| | - Robert L Bradford
- North Carolina Translational and Clinical Sciences (NC TraCS) Institute, UNC Chapel Hill, Chapel Hill, NC USA
| | - Mohamad Daniar
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA USA
| | - Christel Daniel
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Vincent Benoit
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Romain Bey
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Nicolas Paris
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Patricia Serre
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Nina Orlova
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Julien Dubiel
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Martin Hilka
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Anne Sophie Jannot
- Department of Biomedical Informatics, HEGP, APHP Greater Paris University Hospital, Paris, France
| | - Stephane Breant
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Judith Leblanc
- Clinical Research Unit, Saint Antoine Hospital, APHP Greater Paris University Hospital, Paris, France
| | - Nicolas Griffon
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Anita Burgun
- Department of Biomedical Informatics, HEGP, APHP Greater Paris University Hospital, Paris, France
| | - Melodie Bernaux
- Strategy and Transformation Department, APHP Greater Paris University Hospital, Paris, France
| | - Arnaud Sandrin
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Elisa Salamanca
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Sylvie Cormont
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Thomas Ganslandt
- Heinrich-Lanz-Center for Digital Health, University Medicine Mannheim, Heidelberg University, Mannheim, Germany
| | - Tobias Gradinger
- Heinrich-Lanz-Center for Digital Health, University Medicine Mannheim, Heidelberg University, Mannheim, Germany
| | - Julien Champ
- INRIA Sophia-Antipolis-ZENITH Team, LIRMM, Montpellier, France
| | - Martin Boeker
- Institute of Medical Biometry and Statistics, Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
| | - Patricia Martel
- Clinical Research Unit, Paris Saclay, APHP Greater Paris University Hospital, Paris, France
| | - Loic Esteve
- SED/SIERRA, Inria Centre de Paris, Paris, France
| | | | | | | | | | | | | | | | | | | | - Christian Haverkamp
- Institute of Digitalization in Medicine, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
| | | | - Silvano Bosari
- IRCCS Ca' Granda Ospedale Maggiore Policlinico di Milano, Milano, Italy
| | - Ian D Krantz
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA USA
| | - Andrew South
- Brenner Children's Hospital, Wake Forest School of Medicine, Winston-Salem, NC USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| |
Collapse
|
24
|
Brat GA, Weber GM, Gehlenborg N, Avillach P, Palmer NP, Chiovato L, Cimino J, Waitman LR, Omenn GS, Malovini A, Moore JH, Beaulieu-Jones BK, Tibollo V, Murphy SN, Yi SL, Keller MS, Bellazzi R, Hanauer DA, Serret-Larmande A, Gutierrez-Sacristan A, Holmes JJ, Bell DS, Mandl KD, Follett RW, Klann JG, Murad DA, Scudeller L, Bucalo M, Kirchoff K, Craig J, Obeid J, Jouhet V, Griffier R, Cossin S, Moal B, Patel LP, Bellasi A, Prokosch HU, Kraska D, Sliz P, Tan ALM, Ngiam KY, Zambelli A, Mowery DL, Schiver E, Devkota B, Bradford RL, Daniar M, Daniel C, Benoit V, Bey R, Paris N, Serre P, Orlova N, Dubiel J, Hilka M, Jannot AS, Breant S, Leblanc J, Griffon N, Burgun A, Bernaux M, Sandrin A, Salamanca E, Cormont S, Ganslandt T, Gradinger T, Champ J, Boeker M, Martel P, Esteve L, Gramfort A, Grisel O, Leprovost D, Moreau T, Varoquaux G, Vie JJ, Wassermann D, Mensch A, Caucheteux C, Haverkamp C, Lemaitre G, Bosari S, Krantz ID, South A, Cai T, Kohane IS. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med 2020; 3:109. [PMID: 32864472 PMCID: PMC7438496 DOI: 10.1038/s41746-020-00308-0] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 06/16/2020] [Indexed: 12/18/2022] Open
Abstract
We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across five countries (www.covidclinical.net). Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.
Collapse
Affiliation(s)
- Gabriel A. Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Griffin M. Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Nathan P. Palmer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Luca Chiovato
- IRCCS ICS Maugeri, Pavia, Italy
- Department of Internal Medicine and Medical Therapy, University of Pavia, Pavia, Italy
| | | | - Lemuel R. Waitman
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, KS USA
| | - Gilbert S. Omenn
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI USA
| | | | - Jason H. Moore
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | | | | | - Shawn N. Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA USA
| | - Sehi L’ Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Mark S. Keller
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Riccardo Bellazzi
- IRCCS ICS Maugeri, Pavia, Italy
- Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - David A. Hanauer
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI USA
| | | | | | - John J. Holmes
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI USA
| | - Douglas S. Bell
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA USA
| | - Kenneth D. Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA USA
| | - Robert W. Follett
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA USA
| | - Jeffrey G. Klann
- Department of Medicine, Massachusetts General Hospital, Boston, MA USA
| | - Douglas A. Murad
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA USA
| | - Luigia Scudeller
- Scientific Direction, IRCCS Ca’ Granda Ospedale Maggiore Policlinico di Milano, Milano, Italy
| | - Mauro Bucalo
- BIOMERIS (BIOMedical Research Informatics Solutions), Pavia, Italy
| | - Katie Kirchoff
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC USA
| | - Jean Craig
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC USA
| | - Jihad Obeid
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC USA
| | | | | | | | | | - Lav P. Patel
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City, KS USA
| | - Antonio Bellasi
- UOC Ricerca, Innovazione e Brand Reputation, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Hans U. Prokosch
- Department of Medical Informatics, University of Erlangen-Nürnberg, Erlangen, Germany
| | - Detlef Kraska
- Center for Medical Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany
| | - Piotr Sliz
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA USA
| | - Amelia L. M. Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Kee Yuan Ngiam
- National University Health Systems, Singapore, Singapore
| | - Alberto Zambelli
- Department of Oncology, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Danielle L. Mowery
- Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI USA
| | - Emily Schiver
- Penn Medicine, Data Analytics Center, Philadelphia, PA USA
| | - Batsal Devkota
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA USA
| | - Robert L. Bradford
- North Carolina Translational and Clinical Sciences (NC TraCS) Institute, UNC Chapel Hill, Chapel Hill, NC USA
| | - Mohamad Daniar
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA USA
| | - Christel Daniel
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Vincent Benoit
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Romain Bey
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Nicolas Paris
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Patricia Serre
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Nina Orlova
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Julien Dubiel
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Martin Hilka
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Anne Sophie Jannot
- Department of Biomedical Informatics, HEGP, APHP Greater Paris University Hospital, Paris, France
| | - Stephane Breant
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Judith Leblanc
- Clinical Research Unit, Saint Antoine Hospital, APHP Greater Paris University Hospital, Paris, France
| | - Nicolas Griffon
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Anita Burgun
- Department of Biomedical Informatics, HEGP, APHP Greater Paris University Hospital, Paris, France
| | - Melodie Bernaux
- Strategy and Transformation Department, APHP Greater Paris University Hospital, Paris, France
| | - Arnaud Sandrin
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Elisa Salamanca
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Sylvie Cormont
- WIND Department APHP Greater Paris University Hospital, Paris, France
| | - Thomas Ganslandt
- Heinrich-Lanz-Center for Digital Health, University Medicine Mannheim, Heidelberg University, Mannheim, Germany
| | - Tobias Gradinger
- Heinrich-Lanz-Center for Digital Health, University Medicine Mannheim, Heidelberg University, Mannheim, Germany
| | - Julien Champ
- INRIA Sophia-Antipolis—ZENITH Team, LIRMM, Montpellier, France
| | - Martin Boeker
- Institute of Medical Biometry and Statistics, Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
| | - Patricia Martel
- Clinical Research Unit, Paris Saclay, APHP Greater Paris University Hospital, Paris, France
| | - Loic Esteve
- SED/SIERRA, Inria Centre de Paris, Paris, France
| | | | | | | | | | | | | | | | | | | | - Christian Haverkamp
- Institute of Digitalization in Medicine, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg im Breisgau, Germany
| | | | - Silvano Bosari
- IRCCS Ca’ Granda Ospedale Maggiore Policlinico di Milano, Milano, Italy
| | - Ian D. Krantz
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA USA
| | - Andrew South
- Brenner Children’s Hospital, Wake Forest School of Medicine, Winston-Salem, NC USA
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Isaac S. Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| |
Collapse
|
25
|
Rockowitz S, LeCompte N, Carmack M, Quitadamo A, Wang L, Park M, Knight D, Sexton E, Smith L, Sheidley B, Field M, Holm IA, Brownstein CA, Agrawal PB, Kornetsky S, Poduri A, Snapper SB, Beggs AH, Yu TW, Williams DA, Sliz P. Children's rare disease cohorts: an integrative research and clinical genomics initiative. NPJ Genom Med 2020; 5:29. [PMID: 32655885 PMCID: PMC7338382 DOI: 10.1038/s41525-020-0137-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Accepted: 06/03/2020] [Indexed: 12/16/2022] Open
Abstract
While genomic data is frequently collected under distinct research protocols and disparate clinical and research regimes, there is a benefit in streamlining sequencing strategies to create harmonized databases, particularly in the area of pediatric rare disease. Research hospitals seeking to implement unified genomics workflows for research and clinical practice face numerous challenges, as they need to address the unique requirements and goals of the distinct environments and many stakeholders, including clinicians, researchers and sequencing providers. Here, we present outcomes of the first phase of the Children’s Rare Disease Cohorts initiative (CRDC) that was completed at Boston Children’s Hospital (BCH). We have developed a broadly sharable database of 2441 exomes from 15 pediatric rare disease cohorts, with major contributions from early onset epilepsy and early onset inflammatory bowel disease. All sequencing data is integrated and combined with phenotypic and research data in a genomics learning system (GLS). Phenotypes were both manually annotated and pulled automatically from patient medical records. Deployment of a genomically-ordered relational database allowed us to provide a modular and robust platform for centralized storage and analysis of research and clinical data, currently totaling 8516 exomes and 112 genomes. The GLS integrates analytical systems, including machine learning algorithms for automated variant classification and prioritization, as well as phenotype extraction via natural language processing (NLP) of clinical notes. This GLS is extensible to additional analytic systems and growing research and clinical collections of genomic and other types of data.
Collapse
Affiliation(s)
- Shira Rockowitz
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115 USA.,The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA
| | - Nicholas LeCompte
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115 USA.,The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA
| | - Mary Carmack
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115 USA.,The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA
| | - Andrew Quitadamo
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115 USA.,The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA
| | - Lily Wang
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115 USA.,The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA
| | - Meredith Park
- Department of Neurology, F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115 USA.,Division of Epilepsy and Clinical Neurophysiology and Epilepsy Genetics Program, Boston Children's Hospital, Boston, MA 02115 USA
| | - Devon Knight
- Department of Neurology, F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115 USA.,Division of Epilepsy and Clinical Neurophysiology and Epilepsy Genetics Program, Boston Children's Hospital, Boston, MA 02115 USA
| | - Emma Sexton
- Department of Neurology, F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115 USA.,Division of Epilepsy and Clinical Neurophysiology and Epilepsy Genetics Program, Boston Children's Hospital, Boston, MA 02115 USA
| | - Lacey Smith
- Department of Neurology, F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115 USA.,Division of Epilepsy and Clinical Neurophysiology and Epilepsy Genetics Program, Boston Children's Hospital, Boston, MA 02115 USA
| | - Beth Sheidley
- Department of Neurology, F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115 USA.,Division of Epilepsy and Clinical Neurophysiology and Epilepsy Genetics Program, Boston Children's Hospital, Boston, MA 02115 USA
| | - Michael Field
- Division of Gastroenterology, Hepatology and Nutrition, Boston Children's Hospital, Boston, MA 02115 USA
| | - Ingrid A Holm
- The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115 USA
| | - Catherine A Brownstein
- The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115 USA
| | - Pankaj B Agrawal
- The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115 USA.,Division of Newborn Medicine, Boston Children's Hospital, Boston, MA 02115 USA
| | - Susan Kornetsky
- Research Administration, Boston Children's Hospital, Boston, MA 02115 USA
| | - Annapurna Poduri
- Harvard Medical School, Boston, MA 02115 USA.,Department of Neurology, F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA 02115 USA.,Division of Epilepsy and Clinical Neurophysiology and Epilepsy Genetics Program, Boston Children's Hospital, Boston, MA 02115 USA
| | - Scott B Snapper
- Harvard Medical School, Boston, MA 02115 USA.,Division of Gastroenterology, Hepatology and Nutrition, Boston Children's Hospital, Boston, MA 02115 USA
| | - Alan H Beggs
- The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115 USA
| | - Timothy W Yu
- The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115 USA
| | - David A Williams
- Harvard Medical School, Boston, MA 02115 USA.,Division of Hematology/Oncology, Dana-Farber/Boston Children's Cancer and Blood Disorders Center, Boston, MA 02115 USA
| | - Piotr Sliz
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115 USA.,The Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA 02115 USA.,Harvard Medical School, Boston, MA 02115 USA
| |
Collapse
|
26
|
Miller TA, Avillach P, Mandl KD. Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale. JAMIA Open 2020; 3:185-189. [PMID: 32734158 PMCID: PMC7382623 DOI: 10.1093/jamiaopen/ooaa016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/03/2020] [Accepted: 04/14/2020] [Indexed: 11/30/2022] Open
Abstract
OBJECTIVE To develop scalable natural language processing (NLP) infrastructure for processing the free text in electronic health records (EHRs). MATERIALS AND METHODS We extend the open-source Apache cTAKES NLP software with several standard technologies for scalability. We remove processing bottlenecks by monitoring component queue size. We process EHR free text for patients in the PrecisionLink Biobank at Boston Children's Hospital. The extracted concepts are made searchable via a web-based portal. RESULTS We processed over 1.2 million notes for over 8000 patients, extracting 154 million concepts. Our largest tested configuration processes over 1 million notes per day. DISCUSSION The unique information represented by extracted NLP concepts has great potential to provide a more complete picture of patient status. CONCLUSION NLP large EHR document collections can be done efficiently, in service of high throughput phenotyping.
Collapse
Affiliation(s)
- Timothy A Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Paul Avillach
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
27
|
Osman I, Cotzia P, Moran U, Donnelly D, Arguelles-Grande C, Mendoza S, Moreira A. The urgency of utilizing COVID-19 biospecimens for research in the heart of the global pandemic. J Transl Med 2020; 18:219. [PMID: 32487093 PMCID: PMC7266426 DOI: 10.1186/s12967-020-02388-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 05/22/2020] [Indexed: 02/04/2023] Open
Abstract
The outbreak of the novel coronavirus disease 2019 (COVID-19) and consequent social distancing practices have disrupted essential clinical research functions worldwide. Ironically, this coincides with an immediate need for research to comprehend the biology of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the pathology of COVID-19. As the global crisis has already led to over 15,000 deaths out of 175,000 confirmed cases in New York City and Nassau County, NY alone, it is increasingly urgent to collect patient biospecimens linked to active clinical follow up. However, building a COVID-19 biorepository amidst the active pandemic is a complex and delicate task. To help facilitate rapid, robust, and regulated research on this novel virus, we report on the successful model implemented by New York University Langone Health (NYULH) within days of outbreak in the most challenging hot spot of infection globally. Using an amended institutional biobanking protocol, these efforts led to accrual of 11,120 patients presenting for SARS-CoV-2 testing, 4267 (38.4%) of whom tested positive for COVID-19. The recently reported genomic characterization of SARS-CoV-2 in the New York City Region, which is a crucial development in tracing sources of infection and asymptomatic spread of the novel virus, is the first outcome of this effort. While this growing resource actively supports studies of the New York outbreak in real time, a worldwide effort is necessary to build a collective arsenal of research tools to deal with the global crisis now, and to exploit the virus's biology for translational innovation that outlasts humanity's current dilemma.
Collapse
Affiliation(s)
- Iman Osman
- The New York University Langone Health (NYULH) Center of Biospecimen Research and Development, Office of Science and Research, NYU Grossman School of Medicine, 522 First Avenue, SML405, New York, NY, 10016, USA.
| | - Paolo Cotzia
- The New York University Langone Health (NYULH) Center of Biospecimen Research and Development, Office of Science and Research, NYU Grossman School of Medicine, 522 First Avenue, SML405, New York, NY, 10016, USA
| | - Una Moran
- The New York University Langone Health (NYULH) Center of Biospecimen Research and Development, Office of Science and Research, NYU Grossman School of Medicine, 522 First Avenue, SML405, New York, NY, 10016, USA
| | - Douglas Donnelly
- The New York University Langone Health (NYULH) Center of Biospecimen Research and Development, Office of Science and Research, NYU Grossman School of Medicine, 522 First Avenue, SML405, New York, NY, 10016, USA
| | - Carolina Arguelles-Grande
- The New York University Langone Health (NYULH) Center of Biospecimen Research and Development, Office of Science and Research, NYU Grossman School of Medicine, 522 First Avenue, SML405, New York, NY, 10016, USA
| | - Sandra Mendoza
- The New York University Langone Health (NYULH) Center of Biospecimen Research and Development, Office of Science and Research, NYU Grossman School of Medicine, 522 First Avenue, SML405, New York, NY, 10016, USA
| | - Andre Moreira
- The New York University Langone Health (NYULH) Center of Biospecimen Research and Development, Office of Science and Research, NYU Grossman School of Medicine, 522 First Avenue, SML405, New York, NY, 10016, USA
| |
Collapse
|
28
|
Milojevic M, Nikolic A, Jüni P, Head SJ. A statistical primer on subgroup analyses. Interact Cardiovasc Thorac Surg 2020; 30:839-845. [DOI: 10.1093/icvts/ivaa042] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 01/23/2020] [Accepted: 01/29/2020] [Indexed: 11/12/2022] Open
Abstract
Abstract
Resources for clinical research are limited. With increasing demand for patient-centred care, which is growing into an integral component of modern medicine, studying outcomes of patients with specific clinical characteristics is becoming increasingly important. Given the high cost of clinical trials and the time it takes to complete an investigation, it has become compulsory for investigators to assess not only treatment effects between the main randomized groups but also to try to identify clinically relevant subgroups that may particularly benefit from specific treatments. Publications of subgroup analyses turned out to be prevalent, and more importantly, these findings play a significant role in strategic planning and decision-making processes. Therefore, raising awareness among clinicians about the concepts and values of subgroup analysis is an aspect of improving patient outcomes. In this statistical primer, we give a broad introduction to the topic of subgroup analysis in scientific research. We furthermore discuss the concept of subgroup analysis; the motivation for assessing subgroups; the types of subgroup analyses and the paradigm of hypothesis-generating research; the proper statistical methods for the examination of subgroup effects; and the optimal approach for interpretation of results. Finally, this review establishes the comprehensive users’ guide for analysing and reporting subgroup studies on a point-by-point basis, using real-world examples that may help readers to gain experience to pursue their own subgroup analyses or interpret those of others.
Collapse
Affiliation(s)
- Milan Milojevic
- Department of Cardiothoracic Surgery, Erasmus University Medical Center, Rotterdam, Netherlands
- Department of Cardiac Surgery and Cardiovascular Research, Dedinje Cardiovascular Institute, Belgrade, Serbia
| | - Aleksandar Nikolic
- Department of Cardiac Surgery, Acibadem Sistina Hospital, Skopje, North Macedonia
| | - Peter Jüni
- Applied Health Research Centre, Li Ka Shing Knowledge Institute of St. Michael’s Hospital, Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Stuart J Head
- Department of Cardiothoracic Surgery, Erasmus University Medical Center, Rotterdam, Netherlands
| |
Collapse
|