1
|
Goodman JE, Rhomberg LR, Cohen SM, Mundt KA, Case B, Burstyn I, Becich MJ, Gibbs G. Challenges in defining thresholds for health effects: some considerations for asbestos and silica. FRONTIERS IN EPIDEMIOLOGY 2025; 5:1557023. [PMID: 40166649 PMCID: PMC11955591 DOI: 10.3389/fepid.2025.1557023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Accepted: 02/28/2025] [Indexed: 04/02/2025]
Abstract
This paper summarizes several presentations in the Thresholds in Epidemiology and Risk Assessment session at the Monticello III conference. These presentations described evidence regarding thresholds for particles, including asbestos and silica, and cancer (e.g., mesothelioma) and noncancer (e.g., silicosis) endpoints. In the case of exposure to various types of particles and malignancy, it is clear that even though a linear non-threshold model has often been assumed, experimental and theoretical support for thresholds exist (e.g., through particle clearance, repair mechanisms, and various other aspects of the carcinogenic process). For mesothelioma and exposure to elongate mineral particles (EMPs), there remains controversy concerning the epidemiological demonstration of thresholds. However, using data from the Québec mining cohort studies, it was shown that a "practical" threshold exists for chrysotile exposure and mesothelioma. It was also noted that, in such evaluations, measurement error in diagnosis and exposure assessment needs to be incorporated into risk analyses. Researchers were also encouraged to use biobanks that collect specimens and data on mesothelioma to more precisely define cases of mesothelioma and possible variants for cases of all ages, and trends that may help define background rates and distinguish those mesotheliomas related to EMP exposures from those that are not, as well as other factors that support or define thresholds. New statistical approaches have been developed for identifying and quantifying exposure thresholds, an example of which is described for respirable crystalline silica (RCS) exposure and silicosis risk. Finally, the application of Artificial Intelligence (AI) to considering the multiple factors influencing risk and thresholds may prove useful.
Collapse
Affiliation(s)
| | | | - Samuel M. Cohen
- Havlik-Wall Professor of Oncology, Department of Pathology, Microbiology, and Immunology, and the Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE, United States
| | | | | | - Igor Burstyn
- Drexel University, Philadelphia, PA, United States
| | - Michael J. Becich
- University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Graham Gibbs
- Private Consultant in Epidemiology and Occupational Health, Eastbourne, United Kingdom
| |
Collapse
|
2
|
Ren S, Beeche CA, Iyer K, Shi Z, Auster Q, Hawkins JM, Leader JK, Dhupar R, Pu J. Graphical modeling of causal factors associated with the postoperative survival of esophageal cancer subjects. Med Phys 2024; 51:1997-2006. [PMID: 37523254 PMCID: PMC10828112 DOI: 10.1002/mp.16656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 07/07/2023] [Accepted: 07/17/2023] [Indexed: 08/02/2023] Open
Abstract
PURPOSE To clarify the causal relationship between factors contributing to the postoperative survival of patients with esophageal cancer. METHODS A cohort of 195 patients who underwent surgery for esophageal cancer between 2008 and 2021 was used in the study. All patients had preoperative chest computed tomography (CT) and positron emission tomography-CT (PET-CT) scans prior to receiving any treatment. From these images, high throughput and quantitative radiomic features, tumor features, and various body composition features were automatically extracted. Causal relationships among these image features, patient demographics, and other clinicopathological variables were analyzed and visualized using a novel score-based directed graph called "Grouped Greedy Equivalence Search" (GGES) while taking prior knowledge into consideration. After supplementing and screening the causal variables, the intervention do-calculus adjustment (IDA) scores were calculated to determine the degree of impact of each variable on survival. Based on this IDA score, a GGES prediction formula was generated. Ten-fold cross-validation was used to assess the performance of the models. The prediction results were evaluated using the R-Squared Score (R2 score). RESULTS The final causal graphical model was formed by two PET-based image variables, ten body composition variables, four pathological variables, four demographic variables, two tumor variables, and one radiological variable (Percentile 10). Intramuscular fat mass was found to have the most impact on overall survival month. Percentile 10 and overall TNM (T: tumor, N: nodes, M: metastasis) stage were identified as direct causes of overall survival (month). The GGES casual model outperformed GES in regression prediction (R2 = 0.251) (p < 0.05) and was able to avoid unreasonable causality that may contradict common sense. CONCLUSION The GGES causal model can provide a reliable and straightforward representation of the intricate causal relationships among the variables that impact the postoperative survival of patients with esophageal cancer.
Collapse
Affiliation(s)
- Shangsi Ren
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Cameron A. Beeche
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Kartik Iyer
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Zhiyi Shi
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Quentin Auster
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - James M. Hawkins
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Joseph K. Leader
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Rajeev Dhupar
- Department of Cardiothoracic Surgery, Division of Thoracic and Foregut Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Surgical Services Division, Thoracic Surgery, VA Pittsburgh Healthcare System, Pittsburgh, PA 15213
| | - Jiantao Pu
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
3
|
Zhang H, Jethani N, Jones S, Genes N, Major VJ, Jaffe IS, Cardillo AB, Heilenbach N, Ali NF, Bonanni LJ, Clayburn AJ, Khera Z, Sadler EC, Prasad J, Schlacter J, Liu K, Silva B, Montgomery S, Kim EJ, Lester J, Hill TM, Avoricani A, Chervonski E, Davydov J, Small W, Chakravartty E, Grover H, Dodson JA, Brody AA, Aphinyanaphongs Y, Masurkar A, Razavian N. Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.07.10.23292373. [PMID: 38405784 PMCID: PMC10888985 DOI: 10.1101/2023.07.10.23292373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Importance Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR. Objective Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates. Methods Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss' Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation. Results For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT's errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date. Conclusions In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Abraham A. Brody
- NYU Rory Meyers College of Nursing, NYU Grossman School of Medicine
| | | | | | | |
Collapse
|
4
|
Seymour CW, Urbanek KL, Nakayama A, Kennedy JN, Powell R, Robinson RAS, Kapp KL, Billiar TR, Vodovotz Y, Gelhaus SL, Cooper VS, Tang L, Mayr F, Reitz KM, Horvat C, Meyer NJ, Dickson RP, Angus D, Palmer OP. A Prospective Cohort Protocol for the Remnant Investigation in Sepsis Study. Crit Care Explor 2023; 5:e0974. [PMID: 38304708 PMCID: PMC10833627 DOI: 10.1097/cce.0000000000000974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024] Open
Abstract
BACKGROUND Sepsis is a common and deadly syndrome, accounting for more than 11 million deaths annually. To mature a deeper understanding of the host and pathogen mechanisms contributing to poor outcomes in sepsis, and thereby possibly inform new therapeutic targets, sophisticated, and expensive biorepositories are typically required. We propose that remnant biospecimens are an alternative for mechanistic sepsis research, although the viability and scientific value of such remnants are unknown. METHODS AND RESULTS The Remnant Biospecimen Investigation in Sepsis study is a prospective cohort study of 225 adults (age ≥ 18 yr) presenting to the emergency department with community sepsis, defined as sepsis-3 criteria within 6 hours of arrival. The primary objective was to determine the scientific value of a remnant biospecimen repository in sepsis linked to clinical phenotyping in the electronic health record. We will study candidate multiomic readouts of sepsis biology, governed by a conceptual model, and determine the precision, accuracy, integrity, and comparability of proteins, small molecules, lipids, and pathogen sequencing in remnant biospecimens compared with paired biospecimens obtained according to research protocols. Paired biospecimens will include plasma from sodium-heparin, EDTA, sodium fluoride, and citrate tubes. CONCLUSIONS The study has received approval from the University of Pittsburgh Human Research Protection Office (Study 21120013). Recruitment began on October 25, 2022, with planned release of primary results anticipated in 2024. Results will be made available to the public, the funders, critical care societies, laboratory medicine scientists, and other researchers.
Collapse
Affiliation(s)
- Christopher W Seymour
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Kelly Lynn Urbanek
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Anna Nakayama
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Jason N Kennedy
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Rachel Powell
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | | | - Kathryn L Kapp
- Department of Chemistry, Vanderbilt University, Nashville, TN
| | | | | | - Stacy L Gelhaus
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Vaughn S Cooper
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Lu Tang
- Department of Biostatistics, University of Pittsburgh School of Public Health, Pittsburgh, PA
| | - Flo Mayr
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Katherine M Reitz
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Surgery, UPMC, Pittsburgh, PA
| | - Christopher Horvat
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Nuala J Meyer
- Pulmonary, Allergy, and Critical Care Medicine Division, Center for Translational Lung Biology University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Robert P Dickson
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Health System, Ann Arbor, MI
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI
- Division of Pulmonary & Critical Care Medicine, Weil Institute for Critical Care Research and Innovation, Ann Arbor, MI
| | - Derek Angus
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
| | - Octavia Peck Palmer
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Critical Care Medicine, The CRISMA Center, University of Pittsburgh School of Medicine, Pittsburgh, PA
- Department of Microbiology and Molecular Genetics, University of Pittsburgh School of Medicine, Pittsburgh, PA
| |
Collapse
|
5
|
Iyer K, Ren S, Pu L, Mazur S, Zhao X, Dhupar R, Pu J. A Graph-Based Approach to Identify Factors Contributing to Postoperative Lung Cancer Recurrence among Patients with Non-Small-Cell Lung Cancer. Cancers (Basel) 2023; 15:3472. [PMID: 37444581 PMCID: PMC10340686 DOI: 10.3390/cancers15133472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Revised: 06/29/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
The accurate identification of the preoperative factors impacting postoperative cancer recurrence is crucial for optimizing neoadjuvant and adjuvant therapies and guiding follow-up treatment plans. We modeled the causal relationship between radiographical features derived from CT scans and the clinicopathologic factors associated with postoperative lung cancer recurrence and recurrence-free survival. A retrospective cohort of 363 non-small-cell lung cancer (NSCLC) patients who underwent lung resections with a minimum 5-year follow-up was analyzed. Body composition tissues and tumor features were quantified based on preoperative whole-body CT scans (acquired as a component of PET-CT scans) and chest CT scans, respectively. A novel causal graphical model was used to visualize the causal relationship between these factors. Variables were assessed using the intervention do-calculus adjustment (IDA) score. Direct predictors for recurrence-free survival included smoking history, T-stage, height, and intramuscular fat mass. Subcutaneous fat mass, visceral fat volume, and bone mass exerted the greatest influence on the model. For recurrence, the most significant variables were visceral fat volume, subcutaneous fat volume, and bone mass. Pathologic variables contributed to the recurrence model, with bone mass, TNM stage, and weight being the most important. Body composition, particularly adipose tissue distribution, significantly and causally impacted both recurrence and recurrence-free survival through interconnected relationships with other variables.
Collapse
Affiliation(s)
- Kartik Iyer
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA; (K.I.); (S.R.); (X.Z.)
| | - Shangsi Ren
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA; (K.I.); (S.R.); (X.Z.)
| | - Lucy Pu
- Department of Cardiothoracic Surgery, Division of Thoracic and Foregut Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA; (L.P.); (S.M.); (R.D.)
| | - Summer Mazur
- Department of Cardiothoracic Surgery, Division of Thoracic and Foregut Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA; (L.P.); (S.M.); (R.D.)
| | - Xiaoyan Zhao
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA; (K.I.); (S.R.); (X.Z.)
| | - Rajeev Dhupar
- Department of Cardiothoracic Surgery, Division of Thoracic and Foregut Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA; (L.P.); (S.M.); (R.D.)
- Surgical Services Division, Thoracic Surgery, VA Pittsburgh Healthcare System, Pittsburgh, PA 15213, USA
| | - Jiantao Pu
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA; (K.I.); (S.R.); (X.Z.)
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
6
|
Harris CS, Dodd M, Kober KM, Dhruva AA, Hammer M, Conley YP, Miaskowski CA. Advances in Conceptual and Methodological Issues in Symptom Cluster Research: A 20-Year Perspective. ANS Adv Nurs Sci 2022; 45:309-322. [PMID: 35502915 PMCID: PMC9616968 DOI: 10.1097/ans.0000000000000423] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Two conceptual approaches are used to evaluate symptom clusters: "clustering" symptoms (ie, variable-centered analytic approach) and "clustering" patients (ie, person-centered analytic approach). However, these methods are not used consistently and conceptual clarity is needed. Given the emergence of novel methods to evaluate symptom clusters, a review of the conceptual basis for older and newer analytic methods is warranted. Therefore, this article will review the conceptual basis for symptom cluster research; compare and contrast the conceptual basis for using variable-centered versus patient-centered analytic approaches in symptom cluster research; review their strengths and weaknesses; and compare their applications in symptom cluster research.
Collapse
Affiliation(s)
| | - Marylin Dodd
- School of Nursing, University of California, San Francisco, CA, USA
| | - Kord M. Kober
- School of Nursing, University of California, San Francisco, CA, USA
| | - Anand A. Dhruva
- School of Medicine, University of California, San Francisco, CA, USA
| | | | - Yvette P. Conley
- School of Nursing, University of Pittsburgh, Pittsburgh, PA, USA
| | - Christine A. Miaskowski
- School of Nursing, University of California, San Francisco, CA, USA
- School of Medicine, University of California, San Francisco, CA, USA
| |
Collapse
|
7
|
Koplev S, Seldin M, Sukhavasi K, Ermel R, Pang S, Zeng L, Bankier S, Di Narzo A, Cheng H, Meda V, Ma A, Talukdar H, Cohain A, Amadori L, Argmann C, Houten SM, Franzén O, Mocci G, Meelu OA, Ishikawa K, Whatling C, Jain A, Jain RK, Gan LM, Giannarelli C, Roussos P, Hao K, Schunkert H, Michoel T, Ruusalepp A, Schadt EE, Kovacic JC, Lusis AJ, Björkegren JLM. A mechanistic framework for cardiometabolic and coronary artery diseases. NATURE CARDIOVASCULAR RESEARCH 2022; 1:85-100. [PMID: 36276926 PMCID: PMC9583458 DOI: 10.1038/s44161-021-00009-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 11/27/2021] [Indexed: 04/19/2023]
Abstract
Coronary atherosclerosis results from the delicate interplay of genetic and exogenous risk factors, principally taking place in metabolic organs and the arterial wall. Here we show that 224 gene-regulatory coexpression networks (GRNs) identified by integrating genetic and clinical data from patients with (n = 600) and without (n = 250) coronary artery disease (CAD) with RNA-seq data from seven disease-relevant tissues in the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task (STARNET) study largely capture this delicate interplay, explaining >54% of CAD heritability. Within 89 cross-tissue GRNs associated with clinical severity of CAD, 374 endocrine factors facilitated inter-organ interactions, primarily along an axis from adipose tissue to the liver (n = 152). This axis was independently replicated in genetically diverse mouse strains and by injection of recombinant forms of adipose endocrine factors (EPDR1, FCN2, FSTL3 and LBP) that markedly altered blood lipid and glucose levels in mice. Altogether, the STARNET database and the associated GRN browser (http://starnet.mssm.edu) provide a multiorgan framework for exploration of the molecular interplay between cardiometabolic disorders and CAD.
Collapse
Affiliation(s)
- Simon Koplev
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Marcus Seldin
- Departments of Medicine, Human Genetics and Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Biological Chemistry and Center for Epigenetics and Metabolism, University of California, Irvine, CA, USA
| | - Katyayani Sukhavasi
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
| | - Raili Ermel
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
| | - Shichao Pang
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, DZHK (German Centre for Cardiovascular Research), Munich Heart Alliance, Munich, Germany
| | - Lingyao Zeng
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, DZHK (German Centre for Cardiovascular Research), Munich Heart Alliance, Munich, Germany
| | - Sean Bankier
- BHF Centre for Cardiovascular Science, Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, UK
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Antonio Di Narzo
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Haoxiang Cheng
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Vamsidhar Meda
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Angela Ma
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Husain Talukdar
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Ariella Cohain
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Letizia Amadori
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- New York University Cardiovascular Research Center, Department of Medicine, Leon H. Charney Division of Cardiology, New York University Grossman School of Medicine, New York University Langone Health, New York, NY, USA
| | - Carmen Argmann
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sander M. Houten
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Oscar Franzén
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Giuseppe Mocci
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
| | - Omar A. Meelu
- Cardiovascular Research Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kiyotake Ishikawa
- Cardiovascular Research Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Carl Whatling
- Translational Science and Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Anamika Jain
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
| | - Rajeev Kumar Jain
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
| | - Li-Ming Gan
- Early Clinical Development, Research and Early Development, Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Chiara Giannarelli
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- New York University Cardiovascular Research Center, Department of Medicine, Leon H. Charney Division of Cardiology, New York University Grossman School of Medicine, New York University Langone Health, New York, NY, USA
| | - Panos Roussos
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
- Mental Illness Research Education and Clinical Center (MIRECC), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Ke Hao
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | - Heribert Schunkert
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, DZHK (German Centre for Cardiovascular Research), Munich Heart Alliance, Munich, Germany
| | - Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Arno Ruusalepp
- Department of Cardiac Surgery and the Heart Clinic, Tartu University Hospital and Department of Cardiology, Institute of Clinical Medicine, Tartu University, Tartu, Estonia
- Clinical Gene Networks AB, Stockholm, Sweden
| | - Eric E. Schadt
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | - Jason C. Kovacic
- Cardiovascular Research Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia
- St Vincent’s Clinical School, University of NSW, Sydney, New South Wales, Australia
| | - Aldon J. Lusis
- Departments of Medicine, Human Genetics and Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Johan L. M. Björkegren
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden
- Clinical Gene Networks AB, Stockholm, Sweden
| |
Collapse
|
8
|
Panja S, Rahem S, Chu CJ, Mitrofanova A. Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer. Curr Genomics 2021; 22:244-266. [PMID: 35273457 PMCID: PMC8822229 DOI: 10.2174/1389202921999201224110101] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 09/16/2020] [Accepted: 09/30/2020] [Indexed: 11/22/2022] Open
Abstract
Background In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches in light of their application to therapeutic response modeling in cancer. Conclusion We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.
Collapse
Affiliation(s)
| | | | | | - Antonina Mitrofanova
- Address correspondence to this author at the Department of Health Informatics, Rutgers School of Health Professions, Rutgers Biomedical and Health Sciences, Newark, NJ 07107, USA; E-mail:
| |
Collapse
|
9
|
Zhang L, Lin L, Li J. VtNet: A neural network with variable importance assessment. Stat (Int Stat Inst) 2021. [DOI: 10.1002/sta4.325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Lixiang Zhang
- Department of Statistics The Pennsylvania State University University Park 16802 PA USA
| | - Lin Lin
- Department of Statistics The Pennsylvania State University University Park 16802 PA USA
| | - Jia Li
- Department of Statistics The Pennsylvania State University University Park 16802 PA USA
| |
Collapse
|
10
|
van Hartskamp M, Consoli S, Verhaegh W, Petkovic M, van de Stolpe A. Artificial Intelligence in Clinical Health Care Applications: Viewpoint. Interact J Med Res 2019; 8:e12100. [PMID: 30950806 PMCID: PMC6473209 DOI: 10.2196/12100] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 01/18/2019] [Accepted: 01/31/2019] [Indexed: 12/26/2022] Open
Abstract
The idea of artificial intelligence (AI) has a long history. It turned out, however, that reaching intelligence at human levels is more complicated than originally anticipated. Currently, we are experiencing a renewed interest in AI, fueled by an enormous increase in computing power and an even larger increase in data, in combination with improved AI technologies like deep learning. Healthcare is considered the next domain to be revolutionized by artificial intelligence. While AI approaches are excellently suited to develop certain algorithms, for biomedical applications there are specific challenges. We propose six recommendations—the 6Rs—to improve AI projects in the biomedical space, especially clinical health care, and to facilitate communication between AI scientists and medical doctors: (1) Relevant and well-defined clinical question first; (2) Right data (ie, representative and of good quality); (3) Ratio between number of patients and their variables should fit the AI method; (4) Relationship between data and ground truth should be as direct and causal as possible; (5) Regulatory ready; enabling validation; and (6) Right AI method.
Collapse
|
11
|
Rahmadi R, Groot P, van Rijn MHC, van den Brand JAJG, Heins M, Knoop H, Heskes T, the Alzheimer’s Disease Neuroimaging Initiative, the MASTERPLAN Study Group, the OPTIMISTIC consortium. Causality on longitudinal data: Stable specification search in constrained structural equation modeling. Stat Methods Med Res 2018; 27:3814-3834. [PMID: 28657454 PMCID: PMC6249641 DOI: 10.1177/0962280217713347] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finite data can result in completely different optimal models. The present work introduces a novel causal modeling algorithm for longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., the absence of a particular causal relationship between two specific variables. We represent causal relationships using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting, we apply a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more exploratory approach achieves at least comparable performance as, but often a significant improvement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We also present the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidney disease. The findings obtained with our approach are generally in line with results from more hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research.
Collapse
Affiliation(s)
- Ridho Rahmadi
- Department of Informatics, Universitas Islam Indonesia, Sleman, Indonesia
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Perry Groot
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Marieke HC van Rijn
- Department of Nephrology, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Jan AJG van den Brand
- Department of Nephrology, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Marianne Heins
- Netherlands Institute for Health Services Research, Utrecht, The Netherlands
| | - Hans Knoop
- Department of Medical Psychology, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Tom Heskes
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | | | | | | |
Collapse
|
12
|
A review on the application of chromatographic methods, coupled to chemometrics, for food authentication. Food Control 2018. [DOI: 10.1016/j.foodcont.2018.06.015] [Citation(s) in RCA: 94] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Discovering hidden knowledge through auditing clinical diagnostic knowledge bases. J Biomed Inform 2018; 84:75-81. [PMID: 29940263 DOI: 10.1016/j.jbi.2018.06.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Revised: 06/19/2018] [Accepted: 06/21/2018] [Indexed: 11/21/2022]
Abstract
OBJECTIVE Evaluate potential for data mining auditing techniques to identify hidden concepts in diagnostic knowledge bases (KB). Improving completeness enhances KB applications such as differential diagnosis and patient case simulation. MATERIALS AND METHODS Authors used unsupervised (Pearson's correlation - PC, Kendall's correlation - KC, and a heuristic algorithm - HA) methods to identify existing and discover new finding-finding interrelationships ("properties") in the INTERNIST-1/QMR KB. Authors estimated KB maintenance efficiency gains (effort reduction) of the approaches. RESULTS The methods discovered new properties at 95% CI rates of [0.1%, 5.4%] (PC), [2.8%, 12.5%] (KC), and [5.6%, 18.8%] (HA). Estimated manual effort reduction for HA-assisted determination of new properties was approximately 50-fold. CONCLUSION Data mining can provide an efficient supplement to ensuring the completeness of finding-finding interdependencies in diagnostic knowledge bases. Authors' findings should be applicable to other diagnostic systems that record finding frequencies within diseases (e.g., DXplain, ISABEL).
Collapse
|
14
|
Koplev S, Lin K, Dohlman AB, Ma'ayan A. Integration of pan-cancer transcriptomics with RPPA proteomics reveals mechanisms of epithelial-mesenchymal transition. PLoS Comput Biol 2018; 14:e1005911. [PMID: 29293502 PMCID: PMC5766255 DOI: 10.1371/journal.pcbi.1005911] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Revised: 01/12/2018] [Accepted: 12/01/2017] [Indexed: 01/06/2023] Open
Abstract
Integrating data from multiple regulatory layers across cancer types could elucidate additional mechanisms of oncogenesis. Using antibody-based protein profiling of 736 cancer cell lines, along with matching transcriptomic data, we show that pan-cancer bimodality in the amounts of mRNA, protein, and protein phosphorylation reveals mechanisms related to the epithelial-mesenchymal transition (EMT). Based on the bimodal expression of E-cadherin, we define an EMT signature consisting of 239 genes, many of which were not previously associated with EMT. By querying gene expression signatures collected from cancer cell lines after small-molecule perturbations, we identify enrichment for histone deacetylase (HDAC) inhibitors as inducers of EMT, and kinase inhibitors as mesenchymal-to-epithelial transition (MET) promoters. Causal modeling of protein-based signaling identifies putative drivers of EMT. In conclusion, integrative analysis of pan-cancer proteomic and transcriptomic data reveals key regulatory mechanisms of oncogenic transformation. Profiling molecular and phenotypic characteristics of large collections of cancer cell lines can be used to identify distinct and common oncogenic pathways across cancer types. So far, most large-scale data obtained from cancer cell lines have been at the genomic, transcriptomic, and phenotypic levels. Recently, high-quality data at the level of cell signaling through protein abundances and phosphorylation sites has become available. By integrating this newly generated protein data with prior transcriptomic data, and by visualizing all cancer cell lines using dimensionality reduction techniques, pan-cancer cell lines are strikingly shown to organize into a gradient of epithelial to mesenchymal types. Interestingly, many of the measured proteins and transcripts display bimodality; the expression of genes, proteins, and protein phosphorylations is either high or low, strongly suggesting that they act as molecular switches. Focusing on further characterizing molecular switches of epithelial-mesenchymal transitions, we identify candidate regulators and small molecules that can induce or reverse such transition, as well as potential causal relationships between proteins. Since the mesenchymal state of tumors is known to be associated with metastasis and later-stage cancer development, better understanding the regulatory mechanisms of epithelial-to-mesenchymal transition can lead to improved targeted therapeutics.
Collapse
Affiliation(s)
- Simon Koplev
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY, United States of America
| | - Katie Lin
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY, United States of America
| | - Anders B Dohlman
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY, United States of America
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY, United States of America
| |
Collapse
|
15
|
Rahmadi R, Groot P, Heskes T. The stablespec package for causal discovery on cross-sectional and longitudinal data in R. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.10.064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
16
|
Cox LA(T. Do causal concentration–response functions exist? A critical review of associational and causal relations between fine particulate matter and mortality. Crit Rev Toxicol 2017; 47:603-631. [DOI: 10.1080/10408444.2017.1311838] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
17
|
Bui AAT, Van Horn JD. Envisioning the future of 'big data' biomedicine. J Biomed Inform 2017; 69:115-117. [PMID: 28366789 PMCID: PMC5613673 DOI: 10.1016/j.jbi.2017.03.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Revised: 03/17/2017] [Accepted: 03/29/2017] [Indexed: 01/23/2023]
Abstract
Through the increasing availability of more efficient data collection procedures, biomedical scientists are now confronting ever larger sets of data, often finding themselves struggling to process and interpret what they have gathered. This, while still more data continues to accumulate. This torrent of biomedical information necessitates creative thinking about how the data are being generated, how they might be best managed, analyzed, and eventually how they can be transformed into further scientific understanding for improving patient care. Recognizing this as a major challenge, the National Institutes of Health (NIH) has spearheaded the "Big Data to Knowledge" (BD2K) program - the agency's most ambitious biomedical informatics effort ever undertaken to date. In this commentary, we describe how the NIH has taken on "big data" science head-on, how a consortium of leading research centers are developing the means for handling large-scale data, and how such activities are being marshalled for the training of a new generation of biomedical data scientists. All in all, the NIH BD2K program seeks to position data science at the heart of 21st Century biomedical research.
Collapse
Affiliation(s)
- Alex A T Bui
- BD2K Centers Coordinating Center (BD2K CCC), University of California, Los Angeles, Los Angeles, CA, USA. http://www.bd2kccc.org
| | - John Darrell Van Horn
- BD2K Training Coordinating Center (BD2K TCC), University of Southern California, Los Angeles, CA, USA. http://www.bigdatau.org
| |
Collapse
|
18
|
Park HA, Lee JY, On J, Lee JH, Jung H, Park SK. 2016 Year-in-Review of Clinical and Consumer Informatics: Analysis and Visualization of Keywords and Topics. Healthc Inform Res 2017; 23:77-86. [PMID: 28523205 PMCID: PMC5435588 DOI: 10.4258/hir.2017.23.2.77] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Revised: 04/18/2017] [Accepted: 04/20/2017] [Indexed: 11/23/2022] Open
Abstract
Objectives The objective of this study was to review and visualize the medical informatics field over the previous 12 months according to the frequencies of keywords and topics in papers published in the top four journals in the field and in Healthcare Informatics Research (HIR), an official journal of the Korean Society of Medical Informatics. Methods A six-person team conducted an extensive review of the literature on clinical and consumer informatics. The literature was searched using keywords employed in the American Medical Informatics Association year-in-review process and organized into 14 topics used in that process. Data were analyzed using word clouds, social network analysis, and association rules. Results The literature search yielded 370 references and 1,123 unique keywords. ‘Electronic Health Record’ (EHR) (78.6%) was the most frequently appearing keyword in the articles published in the five studied journals, followed by ‘telemedicine’ (2.1%). EHR (37.6%) was also the most frequently studied topic area, followed by clinical informatics (12.0%). However, ‘telemedicine’ (17.0%) was the most frequently appearing keyword in articles published in HIR, followed by ‘telecommunications’ (4.5%). Telemedicine (47.1%) was the most frequently studied topic area, followed by EHR (14.7%). Conclusions The study findings reflect the Korean government's efforts to introduce telemedicine into the Korean healthcare system and reactions to this from the stakeholders associated with telemedicine.
Collapse
Affiliation(s)
- Hyeoun-Ae Park
- College of Nursing, Seoul National University, Seoul, Korea
| | - Joo Yun Lee
- College of Nursing, Seoul National University, Seoul, Korea
| | - Jeongah On
- College of Nursing, Seoul National University, Seoul, Korea
| | - Ji Hyun Lee
- College of Nursing, Seoul National University, Seoul, Korea
| | - Hyesil Jung
- College of Nursing, Seoul National University, Seoul, Korea
| | - Seul Ki Park
- College of Nursing, Seoul National University, Seoul, Korea
| |
Collapse
|
19
|
Chandran UR, Medvedeva OP, Barmada MM, Blood PD, Chakka A, Luthra S, Ferreira A, Wong KF, Lee AV, Zhang Z, Budden R, Scott JR, Berndt A, Berg JM, Jacobson RS. TCGA Expedition: A Data Acquisition and Management System for TCGA Data. PLoS One 2016; 11:e0165395. [PMID: 27788220 PMCID: PMC5082933 DOI: 10.1371/journal.pone.0165395] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 10/11/2016] [Indexed: 11/19/2022] Open
Abstract
Background The Cancer Genome Atlas Project (TCGA) is a National Cancer Institute effort to profile at least 500 cases of 20 different tumor types using genomic platforms and to make these data, both raw and processed, available to all researchers. TCGA data are currently over 1.2 Petabyte in size and include whole genome sequence (WGS), whole exome sequence, methylation, RNA expression, proteomic, and clinical datasets. Publicly accessible TCGA data are released through public portals, but many challenges exist in navigating and using data obtained from these sites. We developed TCGA Expedition to support the research community focused on computational methods for cancer research. Data obtained, versioned, and archived using TCGA Expedition supports command line access at high-performance computing facilities as well as some functionality with third party tools. For a subset of TCGA data collected at University of Pittsburgh, we also re-associate TCGA data with de-identified data from the electronic health records. Here we describe the software as well as the architecture of our repository, methods for loading of TCGA data to multiple platforms, and security and regulatory controls that conform to federal best practices. Results TCGA Expedition software consists of a set of scripts written in Bash, Python and Java that download, extract, harmonize, version and store all TCGA data and metadata. The software generates a versioned, participant- and sample-centered, local TCGA data directory with metadata structures that directly reference the local data files as well as the original data files. The software supports flexible searches of the data via a web portal, user-centric data tracking tools, and data provenance tools. Using this software, we created a collaborative repository, the Pittsburgh Genome Resource Repository (PGRR) that enabled investigators at our institution to work with all TCGA data formats, and to interrogate these data with analysis pipelines, and associated tools. WGS data are especially challenging for individual investigators to use, due to issues with downloading, storage, and processing; having locally accessible WGS BAM files has proven invaluable. Conclusion Our open-source, freely available TCGA Expedition software can be used to create a local collaborative infrastructure for acquiring, managing, and analyzing TCGA data and other large public datasets.
Collapse
Affiliation(s)
- Uma R. Chandran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
- University of Pittsburgh Cancer Institute, Pittsburgh, PA, United States of America
| | - Olga P. Medvedeva
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
| | - M. Michael Barmada
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
- Department of Human Genetics, University of Pittsburgh School of Public Health, Pittsburgh, PA, United States of America
- Center for Simulation and Modeling, University of Pittsburgh, Pittsburgh, PA, United States of America
- UPMC Corporate Services, Pittsburgh, PA, United States of America
| | - Philip D. Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA, United States of America
| | - Anish Chakka
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
- University of Pittsburgh Cancer Institute, Pittsburgh, PA, United States of America
| | - Soumya Luthra
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
- University of Pittsburgh Cancer Institute, Pittsburgh, PA, United States of America
| | - Antonio Ferreira
- Center for Simulation and Modeling, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Kim F. Wong
- Center for Simulation and Modeling, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Adrian V. Lee
- University of Pittsburgh Cancer Institute, Pittsburgh, PA, United States of America
- Department of Pharmacology and Cell Biology, University of Pittsburgh, Pittsburgh, PA, United States of America
- Magee-Women’s Research Institute, Pittsburgh, PA, United States of America
| | - Zhihui Zhang
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA, United States of America
| | - Robert Budden
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA, United States of America
| | - J. Ray Scott
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, PA, United States of America
| | - Annerose Berndt
- UPMC Corporate Services, Pittsburgh, PA, United States of America
| | - Jeremy M. Berg
- Institute for Precision Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Rebecca S. Jacobson
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
- University of Pittsburgh Cancer Institute, Pittsburgh, PA, United States of America
- Institute for Precision Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
- * E-mail:
| |
Collapse
|
20
|
Stern AM, Schurdak ME, Bahar I, Berg JM, Taylor DL. A Perspective on Implementing a Quantitative Systems Pharmacology Platform for Drug Discovery and the Advancement of Personalized Medicine. JOURNAL OF BIOMOLECULAR SCREENING 2016; 21:521-34. [PMID: 26962875 PMCID: PMC4917453 DOI: 10.1177/1087057116635818] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Drug candidates exhibiting well-defined pharmacokinetic and pharmacodynamic profiles that are otherwise safe often fail to demonstrate proof-of-concept in phase II and III trials. Innovation in drug discovery and development has been identified as a critical need for improving the efficiency of drug discovery, especially through collaborations between academia, government agencies, and industry. To address the innovation challenge, we describe a comprehensive, unbiased, integrated, and iterative quantitative systems pharmacology (QSP)-driven drug discovery and development strategy and platform that we have implemented at the University of Pittsburgh Drug Discovery Institute. Intrinsic to QSP is its integrated use of multiscale experimental and computational methods to identify mechanisms of disease progression and to test predicted therapeutic strategies likely to achieve clinical validation for appropriate subpopulations of patients. The QSP platform can address biological heterogeneity and anticipate the evolution of resistance mechanisms, which are major challenges for drug development. The implementation of this platform is dedicated to gaining an understanding of mechanism(s) of disease progression to enable the identification of novel therapeutic strategies as well as repurposing drugs. The QSP platform will help promote the paradigm shift from reactive population-based medicine to proactive personalized medicine by focusing on the patient as the starting and the end point.
Collapse
Affiliation(s)
- Andrew M. Stern
- Department of Computational and Systems Biology, Pittsburgh, PA, USA
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
| | - Mark E. Schurdak
- Department of Computational and Systems Biology, Pittsburgh, PA, USA
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- The University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA
| | - Ivet Bahar
- Department of Computational and Systems Biology, Pittsburgh, PA, USA
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- The University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA
| | - Jeremy M. Berg
- Department of Computational and Systems Biology, Pittsburgh, PA, USA
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- University of Pittsburgh Institute for Personalized Medicine, Pittsburgh, PA, USA
| | - D. Lansing Taylor
- Department of Computational and Systems Biology, Pittsburgh, PA, USA
- University of Pittsburgh Drug Discovery Institute, Pittsburgh, PA, USA
- The University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA
| |
Collapse
|