1
|
Rajwa P, Borkowetz A, Abbott T, Alberti A, Bjartell A, Brash JT, Campi R, Chilelli A, Conover M, Constantinovici N, Davies E, De Meulder B, Eid S, Gacci M, Golozar A, Hafeez H, Haque S, Hijazy A, Hulsen T, Josefsson A, Khalid S, Kolde R, Kotik D, Kurki S, Lambrecht M, Leung CH, Moreno J, Nicoletti R, Nieboer D, Oja M, Palanisamy S, Prinsen P, Reich C, Raffaele Resta G, Ribal MJ, Gómez Rivas J, Smith E, Snijder R, Steinbeisser C, Vandenberghe F, Cornford P, Evans-Axelsson S, N'Dow J, Willemse PPM. Research Protocol for an Observational Health Data Analysis on the Adverse Events of Systemic Treatment in Patients with Metastatic Hormone-sensitive Prostate Cancer: Big Data Analytics Using the PIONEER Platform. EUR UROL SUPPL 2024; 63:81-88. [PMID: 38572301 PMCID: PMC10987796 DOI: 10.1016/j.euros.2024.02.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2024] [Indexed: 04/05/2024] Open
Abstract
Combination therapies in metastatic hormone-sensitive prostate cancer (mHSPC), which include the addition of an androgen receptor signaling inhibitor and/or docetaxel to androgen deprivation therapy, have been a game changer in the management of this disease stage. However, these therapies come with their fair share of toxicities and side effects. The goal of this observational study is to report drug-related adverse events (AEs), which are correlated with systemic combination therapies for mHSPC. Determining the optimal treatment option requires large cohorts to estimate the tolerability and AEs of these combination therapies in "real-life" patients with mHSPC, as provided in this study. We use a network of databases that includes population-based registries, electronic health records, and insurance claims, containing the overall target population and subgroups of patients defined by unique certain characteristics, demographics, and comorbidities, to compute the incidence of common AEs associated with systemic therapies in the setting of mHSPC. These data sources are standardised using the Observational Medical Outcomes Partnership Common Data Model. We perform the descriptive statistics as well as calculate the AE incidence rate separately for each treatment group, stratified by age groups and index year. The time until the first event is estimated using the Kaplan-Meier method within each age group. In the case of episodic events, the anticipated mean cumulative counts of events are calculated. Our study will allow clinicians to tailor optimal therapies for mHSPC patients, and they will serve as a basis for comparative method studies.
Collapse
Affiliation(s)
- Pawel Rajwa
- Department of Urology, Medical University of Silesia, Zabrze, Poland
- Department of Urology, Comprehensive Cancer Center, Medical University of Vienna, Vienna, Austria
| | - Angelika Borkowetz
- Department of Urology, University Hospital Carl Gustav Carus, TU Dresden, Dresden, Germany
| | - Thomas Abbott
- European Association of Urology, Nijmegen, The Netherlands
| | - Andrea Alberti
- Unit of Urological Robotic Surgery and Renal Transplantation, University of Florence, Careggi Hospital, Florence, Italy
| | - Anders Bjartell
- Department of Translational Medicine, Lund University, Lund, Sweden
| | | | - Riccardo Campi
- Unit of Urological Robotic Surgery and Renal Transplantation, University of Florence, Careggi Hospital, Florence, Italy
| | | | | | | | | | | | | | - Mauro Gacci
- Unit of Urological Robotic Surgery and Renal Transplantation, University of Florence, Careggi Hospital, Florence, Italy
| | - Asieh Golozar
- Odysseus Data Services, New York, NY, USA
- OHDSI Center, Northeastern University, Boston, MA, USA
| | - Haroon Hafeez
- Shaukat Khanum Memorial Cancer Hospital & Research Centre, Peshawar, Pakistan
| | | | | | - Tim Hulsen
- Department of Hospital Services & Informatics, Philips Research, Eindhoven, The Netherlands
| | - Andreas Josefsson
- Department of Urology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Center for Molecular Medicine, Umeå University, Umeå, Sweden
| | | | - Raivo Kolde
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Daniel Kotik
- Center for Advanced Systems Understanding, Görlitz, Germany
- Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Germany
| | | | | | - Chi-Ho Leung
- S.H. Ho Urology Centre, Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China
| | | | - Rossella Nicoletti
- Unit of Urological Robotic Surgery and Renal Transplantation, University of Florence, Careggi Hospital, Florence, Italy
| | - Daan Nieboer
- Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Marek Oja
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | | | - Peter Prinsen
- Netherlands Comprehensive Cancer Organisation (IKNL), Utrecht, The Netherlands
| | - Christian Reich
- Odysseus Data Services, New York, NY, USA
- OHDSI Center, Northeastern University, Boston, MA, USA
| | - Giulio Raffaele Resta
- Unit of Urological Robotic Surgery and Renal Transplantation, University of Florence, Careggi Hospital, Florence, Italy
| | - Maria J. Ribal
- Uro-Oncology Unit, Hospital Clinic, University of Barcelona, Barcelona, Spain
| | - Juan Gómez Rivas
- Department of Urology, Hospital Clinico San Carlos, Madrid, Spain
| | - Emma Smith
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands
| | | | | | | | | | | | - James N'Dow
- Academic Urology Unit, University of Aberdeen, Aberdeen, UK
| | - Peter-Paul M. Willemse
- Department of Urology, Cancer Center, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
2
|
Gandaglia G, Pellegrino F, Golozar A, De Meulder B, Abbott T, Achtman A, Imran Omar M, Alshammari T, Areia C, Asiimwe A, Beyer K, Bjartell A, Campi R, Cornford P, Falconer T, Feng Q, Gong M, Herrera R, Hughes N, Hulsen T, Kinnaird A, Lai LYH, Maresca G, Mottet N, Oja M, Prinsen P, Reich C, Remmers S, Roobol MJ, Sakalis V, Seager S, Smith EJ, Snijder R, Steinbeisser C, Thurin NH, Hijazy A, van Bochove K, Van den Bergh RCN, Van Hemelrijck M, Willemse PP, Williams AE, Zounemat Kermani N, Evans-Axelsson S, Briganti A, N'Dow J. Clinical Characterization of Patients Diagnosed with Prostate Cancer and Undergoing Conservative Management: A PIONEER Analysis Based on Big Data. Eur Urol 2024; 85:457-465. [PMID: 37414703 DOI: 10.1016/j.eururo.2023.06.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 05/18/2023] [Accepted: 06/19/2023] [Indexed: 07/08/2023]
Abstract
BACKGROUND Conservative management is an option for prostate cancer (PCa) patients either with the objective of delaying or even avoiding curative therapy, or to wait until palliative treatment is needed. PIONEER, funded by the European Commission Innovative Medicines Initiative, aims at improving PCa care across Europe through the application of big data analytics. OBJECTIVE To describe the clinical characteristics and long-term outcomes of PCa patients on conservative management by using an international large network of real-world data. DESIGN, SETTING, AND PARTICIPANTS From an initial cohort of >100 000 000 adult individuals included in eight databases evaluated during a virtual study-a-thon hosted by PIONEER, we identified newly diagnosed PCa cases (n = 527 311). Among those, we selected patients who did not receive curative or palliative treatment within 6 mo from diagnosis (n = 123 146). OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS Patient and disease characteristics were reported. The number of patients who experienced the main study outcomes was quantified for each stratum and the overall cohort. Kaplan-Meier analyses were used to estimate the distribution of time to event data. RESULTS AND LIMITATIONS The most common comorbidities were hypertension (35-73%), obesity (9.2-54%), and type 2 diabetes (11-28%). The rate of PCa-related symptomatic progression ranged between 2.6% and 6.2%. Hospitalization (12-25%) and emergency department visits (10-14%) were common events during the 1st year of follow-up. The probability of being free from both palliative and curative treatments decreased during follow-up. Limitations include a lack of information on patients and disease characteristics and on treatment intent. CONCLUSIONS Our results allow us to better understand the current landscape of patients with PCa managed with conservative treatment. PIONEER offers a unique opportunity to characterize the baseline features and outcomes of PCa patients managed conservatively using real-world data. PATIENT SUMMARY Up to 25% of men with prostate cancer (PCa) managed conservatively experienced hospitalization and emergency department visits within the 1st year after diagnosis; 6% experienced PCa-related symptoms. The probability of receiving therapies for PCa decreased according to time elapsed after the diagnosis.
Collapse
Affiliation(s)
- Giorgio Gandaglia
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands; Department of Urology and Division of Experimental Oncology, Urological Research Institute, IRCCS San Raffaele Hospital, Milan, Italy.
| | - Francesco Pellegrino
- Department of Urology and Division of Experimental Oncology, Urological Research Institute, IRCCS San Raffaele Hospital, Milan, Italy
| | - Asieh Golozar
- Odysseus Data Services, New York, NY, USA; OHDSI Center, Northeastern University, Boston, MA, USA
| | | | | | | | - Muhammad Imran Omar
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands; Academic Urology Unit, University of Aberdeen, Scotland, UK
| | | | | | | | - Katharina Beyer
- Translational Oncology and Urology Research, King's College London, London, UK
| | - Anders Bjartell
- Department of Translational Medicine, Lund University, Lund, Sweden
| | - Riccardo Campi
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands; Unit of Urological Robotic Surgery and Renal Transplantation, University of Florence, Careggi Hospital, Florence, Italy; Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | | | - Thomas Falconer
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Qi Feng
- Astellas Pharma, Inc., Northbrook, IL, USA
| | - Mengchun Gong
- Nanfang Hospital, Southern Medical University, Guangzhou, China; DHC Technologies, Beijing, China
| | | | | | - Tim Hulsen
- Philips Research, Department of Hospital Services & Informatics, Eindhoven, The Netherlands
| | | | | | | | - Nicolas Mottet
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands
| | - Marek Oja
- Institute of Computer Science, University of Tartu, Tartu, Estonia; STACC, Tartu, Estonia
| | - Peter Prinsen
- Netherlands Comprehensive Cancer Organization, Eindhoven, The Netherlands
| | | | - Sebastiaan Remmers
- Erasmus University Medical Centre, Cancer Institute, Rotterdam, The Netherlands
| | - Monique J Roobol
- Erasmus University Medical Centre, Cancer Institute, Rotterdam, The Netherlands
| | - Vasileios Sakalis
- Department of Urology, General Hospital of Thessaloniki Agios Pavlos, Thessaloniki, Greece
| | | | - Emma J Smith
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands
| | | | | | - Nicolas H Thurin
- INSERM CIC-P 1401, Bordeaux PharmacoEpi, Université de Bordeaux, Bordeaux, France
| | | | | | | | | | - Peter-Paul Willemse
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands; Department of Urology, Cancer Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Andrew E Williams
- The Institute for Clinical Research and Health Policy Studies at Tufts Medical Center, Boston, MA, USA
| | | | | | - Alberto Briganti
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands; Department of Urology and Division of Experimental Oncology, Urological Research Institute, IRCCS San Raffaele Hospital, Milan, Italy
| | - James N'Dow
- Guidelines Office, European Association of Urology, Arnhem, The Netherlands; Academic Urology Unit, University of Aberdeen, Scotland, UK
| |
Collapse
|
3
|
Xia Y, Wang C, Li X, Gao M, Hogg HDJ, Tunthanathip T, Hulsen T, Tian X, Zhao Q. Development and validation of a novel stemness-related prognostic model for neuroblastoma using integrated machine learning and bioinformatics analyses. Transl Pediatr 2024; 13:91-109. [PMID: 38323183 PMCID: PMC10839279 DOI: 10.21037/tp-23-582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/05/2024] [Indexed: 02/08/2024] Open
Abstract
Background Neuroblastoma (NB) is a common solid tumor in children, with a dismal prognosis in high-risk cases. Despite advancements in NB treatment, the clinical need for precise prognostic models remains critical, particularly to address the heterogeneity of cancer stemness which plays a pivotal role in tumor aggressiveness and patient outcomes. By utilizing machine learning (ML) techniques, we aimed to explore the cancer stemness features in NB and identify stemness-related hub genes for future investigation and potential targeted therapy. Methods The public dataset GSE49710 was employed as the training set for acquire gene expression data and NB sample information, including age, stage, and MYCN amplification status and survival. The messenger RNA (mRNA) expression-based stemness index (mRNAsi) was calculated and patients were grouped according to their mRNAsi value. Stemness-related hub genes were identified from the differentially expressed genes (DEGs) to construct a gene signature. This was followed by evaluating the relationship between cancer stemness and the NB immune microenvironment, and the development of a predictive nomogram. We assessed the prognostic outcomes including overall survival (OS) and event-free survival, employing machine learning methods to measure predictive accuracy through concordance indices and validation in an independent cohort E-MTAB-8248. Results Based on mRNAsi, we categorized NB patients into two groups to explore the association between varying levels of stemness and their clinical outcomes. High mRNAsi was linked to the advanced International Neuroblastoma Staging System (INSS) stage, amplified MYCN, and elder age. High mRNAsi patients had a significantly poorer prognosis than low mRNAsi cases. According to the multivariate Cox analysis, the mRNAsi was an independent risk factor of prognosis in NB patients. After least absolute shrinkage and selection operator (LASSO) regression analysis, four key genes (ERCC6L, DUXAP10, NCAN, DIRAS3) most related to mRNAsi scores were discovered and a risk model was built. Our model demonstrated a significant prognostic capacity with hazard ratios (HR) ranging from 18.96 to 41.20, P values below 0.0001, and area under the receiver operating characteristic curve (AUC) values of 0.918 in the training set, suggesting high predictive accuracy which was further confirmed by external verification. Individuals with a low four-gene signature score had a favorable outcome and better immune responses. Finally, a nomogram for clinical practice was constructed by integrating the four-gene signature and INSS stage. Conclusions Our findings confirm the influence of CSC features in NB prognosis. The newly developed NB stemness-related four-gene signature prognostic signature could facilitate the prognostic prediction, and the identified hub genes may serve as promising targets for individualized treatments.
Collapse
Affiliation(s)
- Yuren Xia
- National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin, China
- Department of General Surgery, Tianjin Cancer Hospital Airport Hospital, Tianjin, China
| | - Chaoyu Wang
- National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin, China
| | - Xin Li
- National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin, China
- Department of Pathology, Tianjin Cancer Hospital Airport Hospital, Tianjin, China
| | - Mingyou Gao
- National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin, China
| | - Henry David Jeffry Hogg
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle Upon Tyne, UK
| | - Thara Tunthanathip
- Division of Neurosurgery, Department of Surgery, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla, Thailand
| | - Tim Hulsen
- Data Science & AI Engineering, Philips, Eindhoven, The Netherlands
| | - Xiangdong Tian
- National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin, China
| | - Qiang Zhao
- National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute & Hospital, Tianjin, China
| |
Collapse
|
4
|
Hulsen T, Friedecký D, Renz H, Melis E, Vermeersch P, Fernandez-Calle P. From big data to better patient outcomes. Clin Chem Lab Med 2023; 61:580-586. [PMID: 36539928 DOI: 10.1515/cclm-2022-1096] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]
Abstract
Among medical specialties, laboratory medicine is the largest producer of structured data and must play a crucial role for the efficient and safe implementation of big data and artificial intelligence in healthcare. The area of personalized therapies and precision medicine has now arrived, with huge data sets not only used for experimental and research approaches, but also in the "real world". Analysis of real world data requires development of legal, procedural and technical infrastructure. The integration of all clinical data sets for any given patient is important and necessary in order to develop a patient-centered treatment approach. Data-driven research comes with its own challenges and solutions. The Findability, Accessibility, Interoperability, and Reusability (FAIR) Guiding Principles provide guidelines to make data findable, accessible, interoperable and reusable to the research community. Federated learning, standards and ontologies are useful to improve robustness of artificial intelligence algorithms working on big data and to increase trust in these algorithms. When dealing with big data, the univariate statistical approach changes to multivariate statistical methods significantly shifting the potential of big data. Combining multiple omics gives previously unsuspected information and provides understanding of scientific questions, an approach which is also called the systems biology approach. Big data and artificial intelligence also offer opportunities for laboratories and the In Vitro Diagnostic industry to optimize the productivity of the laboratory, the quality of laboratory results and ultimately patient outcomes, through tools such as predictive maintenance and "moving average" based on the aggregate of patient results.
Collapse
Affiliation(s)
- Tim Hulsen
- Department of Hospital Services & Informatics, Philips Research, Eindhoven, The Netherlands
| | - David Friedecký
- Department of Clinical Biochemistry, Laboratory for Inherited Metabolic Disorders, University Hospital Olomouc and Faculty of Medicine and Dentistry, Palacký University in Olomouc, Olomouc, Czech Republic
| | - Harald Renz
- Institute of Laboratory Medicine, member of the German Center for Lung Research (DZL), and the Universities of Giessen and Marburg Lung Center (UGMLC), Philipps University Marburg, Marburg, Germany
- Department of Clinical Immunology and Allergy, Laboratory of Immunopathology, I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| | - Els Melis
- Ortho Clinical Diagnostics, Zaventem, Belgium
| | - Pieter Vermeersch
- Clinical Department of Laboratory Medicine, University Hospitals Leuven, Leuven, Belgium
- Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium
- European Federation of Clinical Chemistry and Laboratory Medicine (EFLM), Milan, Italy
| | - Pilar Fernandez-Calle
- European Federation of Clinical Chemistry and Laboratory Medicine (EFLM), Milan, Italy
- Department of Laboratory Medicine, Hospital Universitario La Paz, Madrid, Spain
| |
Collapse
|
5
|
Abstract
Artificial intelligence (AI) refers to the simulation of human intelligence in machines, using machine learning (ML), deep learning (DL) and neural networks (NNs). AI enables machines to learn from experience and perform human-like tasks. The field of AI research has been developing fast over the past five to ten years, due to the rise of 'big data' and increasing computing power. In the medical area, AI can be used to improve diagnosis, prognosis, treatment, surgery, drug discovery, or for other applications. Therefore, both academia and industry are investing a lot in AI. This review investigates the biomedical literature (in the PubMed and Embase databases) by looking at bibliographical data, observing trends over time and occurrences of keywords. Some observations are made: AI has been growing exponentially over the past few years; it is used mostly for diagnosis; COVID-19 is already in the top-3 of diseases studied using AI; China, the United States, South Korea, the United Kingdom and Canada are publishing the most articles in AI research; Stanford University is the world's leading university in AI research; and convolutional NNs are by far the most popular DL algorithms at this moment. These trends could be studied in more detail, by studying more literature databases or by including patent databases. More advanced analyses could be used to predict in which direction AI will develop over the coming years. The expectation is that AI will keep on growing, in spite of stricter privacy laws, more need for standardization, bias in the data, and the need for building trust.
Collapse
|
6
|
Abstract
Data science is an interdisciplinary field that applies numerous techniques, such as machine learning (ML), neural networks (NN) and artificial intelligence (AI), to create value, based on extracting knowledge and insights from available 'big' data [...].
Collapse
Affiliation(s)
- Tim Hulsen
- Department of Hospital Services & Informatics, Philips Research, 5656AE Eindhoven, The Netherlands
| |
Collapse
|
7
|
Gandaglia G, Omar M, Maresca G, Golozar A, Remmers S, Roobol M, Steinbeisser C, Hulsen T, Van Bochove K, Katharina B, Van Hemelrijck M, Willemse PP, Oja M, Tamm S, Reisberg S, Gomez Rivas J, Van Den Bergh R, Kinnaird A, Asiimwe A, Bjartell A, Smith E, N'Dow J. Clinical characterization and outcomes of prostate cancer patients undergoing immediate vs. conservative management: A PIONEER study. Eur Urol 2022. [DOI: 10.1016/s0302-2838(22)01127-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
8
|
Hulsen T, Moinat M, Van Bochove K, Gorbachev A, Kaduk D, Argyriou G, Cossin S, Herrera R, Golozar A, Prinsen P, Beyer K, Van Hemelrijck M, Oja M, Axelsson S, Steinbeisser C, De Meulder B. The PIONEER watchful waiting for prostate cancer apps - a first practical application of using big data for prostate cancer. Eur Urol 2022. [DOI: 10.1016/s0302-2838(22)01131-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
9
|
Hulsen T, Petkovic M, Varga OE, Jamuar SS. Editorial: AI in Healthcare: From Data to Intelligence. Front Artif Intell 2022; 5:909391. [PMID: 35592647 PMCID: PMC9111011 DOI: 10.3389/frai.2022.909391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 04/05/2022] [Indexed: 02/05/2023] Open
Affiliation(s)
- Tim Hulsen
- Department of Hospital Services and Informatics, Philips Research, Eindhoven, Netherlands
- *Correspondence: Tim Hulsen
| | - Milan Petkovic
- Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, Netherlands
- AI and Data Science Center, Philips Medical Systems, Eindhoven, Netherlands
| | - Orsolya Edit Varga
- Department of Public Health and Epidemiology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Saumya Shekhar Jamuar
- Duke-NUS Medical School, Singapore, Singapore
- KK Women's and Children's Hospital, Singapore, Singapore
| |
Collapse
|
10
|
Santaolalla A, Hulsen T, Davis J, Ahmed HU, Moore CM, Punwani S, Attard G, McCartan N, Emberton M, Coolen A, Van Hemelrijck M. The ReIMAGINE Multimodal Warehouse: Using Artificial Intelligence for Accurate Risk Stratification of Prostate Cancer. Front Artif Intell 2021; 4:769582. [PMID: 34870187 PMCID: PMC8637844 DOI: 10.3389/frai.2021.769582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/12/2021] [Indexed: 02/05/2023] Open
Abstract
Introduction. Prostate cancer (PCa) is the most frequent cancer diagnosis in men worldwide. Our ability to identify those men whose cancer will decrease their lifespan and/or quality of life remains poor. The ReIMAGINE Consortium has been established to improve PCa diagnosis. Materials and methods. MRI will likely become the future cornerstone of the risk-stratification process for men at risk of early prostate cancer. We will, for the first time, be able to combine the underlying molecular changes in PCa with the state-of-the-art imaging. ReIMAGINE Screening invites men for MRI and PSA evaluation. ReIMAGINE Risk includes men at risk of prostate cancer based on MRI, and includes biomarker testing. Results. Baseline clinical information, genomics, blood, urine, fresh prostate tissue samples, digital pathology and radiomics data will be analysed. Data will be de-identified, stored with correlated mpMRI disease endotypes and linked with long term follow-up outcomes in an instance of the Philips Clinical Data Lake, consisting of cloud-based software. The ReIMAGINE platform includes application programming interfaces and a user interface that allows users to browse data, select cohorts, manage users and access rights, query data, and more. Connection to analytics tools such as Python allows statistical and stratification method pipelines to run profiling regression analyses. Discussion. The ReIMAGINE Multimodal Warehouse comprises a unique data source for PCa research, to improve risk stratification for PCa and inform clinical practice. The de-identified dataset characterized by clinical, imaging, genomics and digital pathology PCa patient phenotypes will be a valuable resource for the scientific and medical community.
Collapse
Affiliation(s)
- Aida Santaolalla
- King’s College London, School of Cancer and Pharmaceutical Sciences, Translational Oncology and Urology Research (TOUR), London, United Kingdom
| | - Tim Hulsen
- Philips Research, Department of Hospital Services and Informatics, Eindhoven, Netherlands
| | - Jenson Davis
- Philips, Data Science Services, Best, Netherlands
| | - Hashim U. Ahmed
- Imperial College London, Faculty of Medicine, Imperial Prostate, Department of Surgery and Cancer, London, United Kingdom
| | - Caroline M. Moore
- Division of Surgical and Interventional Science, University College London, London, United Kingdom
| | - Shonit Punwani
- Centre for Medical Imaging, University College London, London, United Kingdom
| | - Gert Attard
- Cancer Institute, University College London, London, United Kingdom
| | - Neil McCartan
- Division of Surgical and Interventional Science, University College London, London, United Kingdom
| | - Mark Emberton
- Division of Surgical and Interventional Science, University College London, London, United Kingdom
| | - Anthony Coolen
- King’s College London, School of Cancer and Pharmaceutical Sciences, Translational Oncology and Urology Research (TOUR), London, United Kingdom
- Department of Biophysics, Donders Institute, Radboud University Nijmegen, Nijmegen, Netherlands
| | - Mieke Van Hemelrijck
- King’s College London, School of Cancer and Pharmaceutical Sciences, Translational Oncology and Urology Research (TOUR), London, United Kingdom
| |
Collapse
|
11
|
Crump RT, Remmers S, Van Hemelrijck M, Helleman J, Nieboer D, Roobol MJ, Venderbos LDF, Trock B, Ehdaie B, Carroll P, Filson C, Logothetis C, Morgan T, Klotz L, Pickles T, Hyndman E, Moore C, Gnanapragasam V, Van Hemelrijck M, Dasgupta P, Bangma C, Roobol M, Villers A, Robert G, Semjonow A, Rannikko A, Valdagni R, Perry A, Hugosson J, Rubio-Briones J, Bjartell A, Hefermehl L, Shiong LL, Frydenberg M, Sugimoto M, Chung BH, van der Kwast T, Hulsen T, de Jonge C, van Hooft P, Kattan M, Xinge J, Muir K, Lophatananon A, Fahey M, Steyerberg E, Nieboer D, Zhang L, Steyerberg E, Nieboer D, Beckmann K, Denton B, Hayen A, Boutros P, Guo W, Benfante N, Cowan J, Patil D, Park L, Ferrante S, Mamedov A, LaPointe V, Crump T, Stavrinides V, Kimberly-Duffell J, Santaolalla A, Nieboer D, Olivier J, France B, Rancati T, Ahlgren H, Mascarós J, Löfgren A, Lehmann K, Lin CH, Cusick T, Hirama H, Lee KS, Jenster G, Auvinen A, Bjartell A, Haider M, van Bochove K, Buzza M, Kouspou M, Paich K, Bangma C, Roobol M, Helleman J. Using the Movember Foundation's GAP3 cohort to measure the effect of active surveillance on patient-reported urinary and sexual function-a retrospective study in low-risk prostate cancer patients. Transl Androl Urol 2021; 10:2719-2727. [PMID: 34295757 PMCID: PMC8261406 DOI: 10.21037/tau-20-1255] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 04/29/2021] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Active surveillance (AS) for low-risk prostate cancer (PCa) is intended to overcome potential side-effects of definitive treatment. Frequent prostate biopsies during AS may, however, impact erectile (EF) and urinary function (UF). The objective of this study was to test the influence of prostate biopsies on patient-reported EF and UF using multicenter data from the largest to-date AS-database. METHODS In this retrospective study, data analyses were performed using the Movember GAP3 database (v3.2), containing data from 21,169 AS participants from 27 AS-cohorts worldwide. Participants were included in the study if they had at least one follow-up prostate biopsy and completed at least one patient reported outcome measure (PROM) related to EF [Sexual Health Inventory for Men (SHIM)/five item International Index of Erectile Function (IIEF-5)] or UF [International Prostate Symptom Score (IPSS)] during follow-up. The longitudinal effect of the number of biopsies on either SHIM/IIEF-5 or IPSS were analyzed using linear mixed models to adjust for clustering at patient-level. Analyses were stratified by center; covariates included age and Gleason Grade group at diagnosis, and time on AS. RESULTS A total of 696 participants completed the SHIM/IIEF-5 3,175 times, with a median follow-up of 36 months [interquartile range (IQR) 20-55 months]. A total of 845 participants completed the IPSS 4,061 times, with a median follow-up of 35 months (IQR 19-56 months). The intraclass correlation (ICC) was 0.74 for the SHIM/IIEF-5 and 0.68 for the IPSS, indicating substantial differences between participants' PROMs. Limited heterogeneity between cohorts in the estimated effect of the number of biopsies on either PROM were observed. A significant association was observed between the number of biopsies and the SHIM/IIEF-5 score, but not for the IPSS score. Every biopsy was associated with a decrease in the SHIM/IIEF-5 score of an average 0.67 (95% CI, 0.47-0.88) points. CONCLUSIONS Repeated prostate biopsy as part of an AS protocol for men with low-risk PCa does not have a significant association with self-reported UF but does impact self-reported sexual function. Further research is, however, needed to understand whether the effect on sexual function implies a negative clinical impact on their quality of life and is meaningful from a patient's perspective. In the meantime, clinicians and patients should anticipate a potential decline in erectile function and hence consider incorporating the risk of this harm into their discussion about opting for AS and also when deciding on the stringency of follow-up biopsy schedules with long-term AS.
Collapse
Affiliation(s)
| | - Sebastiaan Remmers
- Department of Urology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Mieke Van Hemelrijck
- King’s College London, Faculty of Life Sciences and Medicine, Translational Oncology & Urology Research (TOUR), London, UK
| | - Jozien Helleman
- Department of Urology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Daan Nieboer
- Department of Urology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Monique J. Roobol
- Department of Urology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Hulsen T. BioVenn – an R and Python package for the comparison and visualization of biological lists using area-proportional Venn diagrams. DS 2021. [DOI: 10.3233/ds-210032] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
One of the most popular methods to visualize the overlap and differences between data sets is the Venn diagram. Venn diagrams are especially useful when they are ‘area-proportional’ i.e. the sizes of the circles and the overlaps correspond to the sizes of the data sets. In 2007, the BioVenn web interface was launched, which is being used by many researchers. However, this web implementation requires users to copy and paste (or upload) lists of IDs into the web browser, which is not always convenient and makes it difficult for researchers to create Venn diagrams ‘in batch’, or to automatically update the diagram when the source data changes. This is only possible by using software such as R or Python. This paper describes the BioVenn R and Python packages, which are very easy-to-use packages that can generate accurate area-proportional Venn diagrams of two or three circles directly from lists of (biological) IDs. The only required input is two or three lists of IDs. Optional parameters include the main title, the subtitle, the printing of absolute numbers or percentages within the diagram, colors and fonts. The function can show the diagram on the screen, or it can export the diagram in one of the supported file formats. The function also returns all thirteen lists. The BioVenn R package and Python package were created for biological IDs, but they can be used for other IDs as well. Finally, BioVenn can map Affymetrix and EntrezGene to Ensembl IDs. The BioVenn R package is available in the CRAN repository, and can be installed by running ‘install.packages(“BioVenn”)’. The BioVenn Python package is available in the PyPI repository, and can be installed by running ‘pip install BioVenn’. The BioVenn web interface remains available at https://www.biovenn.nl.
Collapse
Affiliation(s)
- Tim Hulsen
- Department of Hospital Services & Informatics, Philips Research, Eindhoven, The Netherlands. E-mail:
| |
Collapse
|
13
|
Van Hemelrijck M, Ji X, Helleman J, Roobol MJ, Nieboer D, Bangma C, Frydenberg M, Rannikko A, Lee LS, Gnanapragasam V, Kattan MW, Trock B, Ehdaie B, Carroll P, Filson C, Kim J, Logothetis C, Morgan T, Klotz L, Pickles T, Hyndman E, Moore C, Gnanapragasam V, Van Hemelrijck M, Dasgupta P, Bangma C, Roobol M, Villers A, Rannikko A, Valdagni R, Perry A, Hugosson J, Rubio-Briones J, Bjartell A, Hefermehl L, Shiong LL, Frydenberg M, Kakehi Y, Chung MSBH, van der Kwast T, Obbink H, van der Linden W, Hulsen T, de Jonge C, Kattan M, Xinge J, Muir K, Lophatananon A, Fahey M, Steyerberg E, Nieboer D, Zhang L, Guo W, Benfante N, Cowan J, Patil D, Tolosa E, Kim TK, Mamedov A, LaPointe V, Crump T, Stavrinides V, Kimberly-Duffell J, Santaolalla A, Nieboer D, Olivier J, Rancati T, Ahlgren H, Mascarós J, Löfgren A, Lehmann K, Lin CH, Hirama H, Lee KS, Jenster G, Auvinen A, Bjartell A, Haider M, van Bochove K, Carter B, Gledhill S, Buzza M, Kouspou M, Bangma C, Roobol M, Bruinsma S, Helleman J. A first step towards a global nomogram to predict disease progression for men on active surveillance. Transl Androl Urol 2021; 10:1102-1109. [PMID: 33850745 PMCID: PMC8039580 DOI: 10.21037/tau-20-1082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Signs of disease progression (28%) and conversion to active treatment without evidence of disease progression (13%) are the main reasons for discontinuation of active surveillance (AS) in men with localised prostate cancer (PCa). We aimed to develop a nomogram to predict disease progression in these patients. METHODS As a first step in the development of a nomogram, using data from Movembers' GAP3 Consortium (n=14,380), we assessed heterogeneity between centres in terms of risk of disease progression. We started with assessment of baseline hazards for disease progression based on grouping of centres according to follow-up protocols [high: yearly; intermediate: ~2 yearly; and low: at year 1, 4 & 7 (i.e., PRIAS)]. We conducted cause-specific random effect Cox proportional hazards regression to estimate risk of disease progression by centre in each group. RESULTS Disease progression rates varied substantially between centres [median hazard ratio (MHR): 2.5]. After adjustment for various clinical factors (age, year of diagnosis, Gleason grade group, number of positive cores and PSA), substantial heterogeneity in disease progression remained between centres. CONCLUSIONS When combining worldwide data on AS, we noted unexplained differences of disease progression rate even after adjustment for various clinical factors. This suggests that when developing a global nomogram, local adjustments for differences in risk of disease progression and competing outcomes such as conversion to active treatment need to be considered.
Collapse
Affiliation(s)
- Mieke Van Hemelrijck
- Translational Oncology & Urology Research (TOUR), School of Cancer and Pharmaceutical Sciences, King’s College London, London, UK
| | - Xinge Ji
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, USA
| | - Jozien Helleman
- Department of Urology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Monique J. Roobol
- Department of Urology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Daan Nieboer
- Department of Urology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Chris Bangma
- Department of Urology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | - Antti Rannikko
- Department of Urology, Helsinki University and Helsinki University Hospital, Helsinki, Finland
| | - Lui Shiong Lee
- Department of Urology, Sengkang General Hospital and Singapore General Hospital, Singapore, Singapore
| | - Vincent Gnanapragasam
- Academic Urology Group, Department of Surgery and Oncology, University of Cambridge, Cambridge, UK
| | - Michael W. Kattan
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Hulsen T. Sharing Is Caring-Data Sharing Initiatives in Healthcare. Int J Environ Res Public Health 2020; 17:ijerph17093046. [PMID: 32349396 PMCID: PMC7246891 DOI: 10.3390/ijerph17093046] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 03/17/2020] [Accepted: 04/24/2020] [Indexed: 02/05/2023]
Abstract
In recent years, more and more health data are being generated. These data come not only from professional health systems, but also from wearable devices. All these 'big data' put together can be utilized to optimize treatments for each unique patient ('precision medicine'). For this to be possible, it is necessary that hospitals, academia and industry work together to bridge the 'valley of death' of translational medicine. However, hospitals and academia often are reluctant to share their data with other parties, even though the patient is actually the owner of his/her own health data. Academic hospitals usually invest a lot of time in setting up clinical trials and collecting data, and want to be the first ones to publish papers on this data. There are some publicly available datasets, but these are usually only shared after study (and publication) completion, which means a severe delay of months or even years before others can analyse the data. One solution is to incentivize the hospitals to share their data with (other) academic institutes and the industry. Here, we show an analysis of the current literature around data sharing, and we discuss five aspects of data sharing in the medical domain: publisher requirements, data ownership, growing support for data sharing, data sharing initiatives and how the use of federated data might be a solution. We also discuss some potential future developments around data sharing, such as medical crowdsourcing and data generalists.
Collapse
Affiliation(s)
- Tim Hulsen
- Department of Professional Health Solutions & Services, Philips Research, 5656AE Eindhoven, The Netherlands
| |
Collapse
|
15
|
Affiliation(s)
- Tim Hulsen
- Department of Professional Health Solutions & Services, Philips Research, Eindhoven, The Netherlands. E-mail:
| |
Collapse
|
16
|
van der Kwast TH, Helleman J, Nieboer D, Bruinsma SM, Roobol MJ, Trock B, Ehdaie B, Carroll P, Filson C, Kim J, Logothetis C, Morgan T, Klotz L, Pickles T, Hyndman E, Moore CM, Gnanapragasam V, Van Hemelrijck M, Dasgupta P, Bangma C, Roobol M, Villers A, Rannikko A, Valdagni R, Perry A, Hugosson J, Rubio-Briones J, Bjartell A, Hefermehl L, Shiong LL, Frydenberg M, Kakehi Y, Chung BH, van der Kwast T, Obbink H, van der Linden W, Hulsen T, de Jonge C, Kattan M, Xinge J, Muir K, Lophatananon A, Fahey M, Steyerberg E, Nieboer D, Zhang L, Guo W, Benfante N, Cowan J, Patil D, Tolosa E, Kim TK, Mamedov A, LaPointe V, Crump T, Kimberly-Duffell J, Santaolalla A, Nieboer D, Olivier JT, Rancati T, Ahlgren H, Mascarós J, Löfgren A, Lehmann K, Lin CH, Hirama H, Lee KS, Jenster G, Auvinen A, Bjartell A, Haider M, van Bochove K, Carter B, Gledhill S, Buzza M, Bangma C, Roobol M, Bruinsma S, Helleman J. Consistent Biopsy Quality and Gleason Grading Within the Global Active Surveillance Global Action Plan 3 Initiative: A Prerequisite for Future Studies. Eur Urol Oncol 2019; 2:333-336. [PMID: 31200849 DOI: 10.1016/j.euo.2018.08.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Revised: 08/10/2018] [Accepted: 08/21/2018] [Indexed: 02/05/2023]
Abstract
Within the Movember Foundation's Global Action Plan Prostate Cancer Active Surveillance (GAP3) initiative, 25 centers across the globe collaborate to standardize active surveillance (AS) protocols for men with low-risk prostate cancer (PCa). A centralized PCa AS database, comprising data of more than 15000 patients worldwide, was created. Comparability of the histopathology between the different cohorts was assessed by a centralized pathology review of 445 biopsies from 15 GAP3 centers. Grade group 1 (Gleason score 6) in 85% and grade group ≥2 (Gleason score ≥7) in 15% showed 89% concordance at review with moderate agreement (κ=0.56). Average biopsy core length was similar among the analyzed cohorts. Recently established highly adverse pathologies, including cribriform and/or intraductal carcinoma, were observed in 3.6% of the reviewed biopsies. In conclusion, the centralized pathology review of 445 biopsies revealed comparable histopathology among the 15 GAP3 centers with a low frequency of high-risk features. This enables further data analyses-without correction-toward uniform global AS guidelines for men with low-risk PCa. PATIENT SUMMARY: Movember Foundation's Global Action Plan Prostate Cancer Active Surveillance (GAP3) initiative combines data from 15000 men with low-risk prostate cancer (PCa) across the globe to standardize active surveillance protocols. Histopathology review confirmed that the histopathology was consistent with low-risk PCa in most men and comparable between different centers.
Collapse
Affiliation(s)
- Theo H van der Kwast
- Department of Pathology, Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada.
| | - Jozien Helleman
- Department of Urology, Erasmus MC, Rotterdam, The Netherlands
| | - Daan Nieboer
- Department of Urology, Erasmus MC, Rotterdam, The Netherlands; Department of Public Health, Erasmus MC, Rotterdam, The Netherlands
| | | | | | | | - Bruce Trock
- Johns Hopkins University, The James Buchanan Brady Urological Institute, Baltimore, MD, USA
| | - Behfar Ehdaie
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Peter Carroll
- University of California San Francisco, San Francisco, CA, USA
| | - Christopher Filson
- Emory University School of Medicine, Winship Cancer Institute, Atlanta, GA, USA
| | - Jeri Kim
- MD Anderson Cancer Centre, Houston, TX, USA
| | | | - Todd Morgan
- University of Michigan and Michigan Urological Surgery Improvement Collaborative, Michigan, USA
| | - Laurence Klotz
- University of Toronto, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Tom Pickles
- University of British Columbia, BC Cancer Agency, Vancouver, Canada
| | - Eric Hyndman
- University of Calgary, Southern Alberta Institute of Urology, Calgary, Canada
| | - Caroline M Moore
- University College London and University College London Hospital Trust, London, UK
| | - Vincent Gnanapragasam
- University of Cambridge and Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Mieke Van Hemelrijck
- King's College London, London, UK; Guy's and St Thomas' NHS Foundation Trust, London, UK
| | | | - Chris Bangma
- Erasmus Medical Center, Rotterdam, The Netherlands
| | | | | | - Antti Rannikko
- Helsinki University and Helsinki University Hospital, Helsinki, Finland
| | - Riccardo Valdagni
- Department of Oncology and Hemato-oncology, Università degli Studi di Milano, Radiation Oncology 1 and Prostate Cancer Program, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | | | | | | | | | | | | | | | | | - Byung Ha Chung
- Gangnam Severance Hospital, Yonsei University Health System, Seoul, Republic of Korea
| | | | | | | | - Tim Hulsen
- Royal Philips, Eindhoven, The Netherlands
| | | | | | - Ji Xinge
- Cleveland Clinic, Cleveland, OH, USA
| | | | | | | | | | - Daan Nieboer
- Erasmus Medical Center, Rotterdam, The Netherlands
| | - Liying Zhang
- University of Toronto, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Wei Guo
- Johns Hopkins University, The James Buchanan Brady Urological Institute, Baltimore, MD, USA
| | | | - Janet Cowan
- University of California San Francisco, San Francisco, CA, USA
| | - Dattatraya Patil
- Emory University School of Medicine, Winship Cancer Institute, Atlanta, GA, USA
| | | | - Tae-Kyung Kim
- University of Michigan and Michigan Urological Surgery Improvement Collaborative, Ann Arbor, MI, USA
| | - Alexandre Mamedov
- University of Toronto, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Vincent LaPointe
- University of British Columbia, BC Cancer Agency, Vancouver, Canada
| | - Trafford Crump
- University of Calgary, Southern Alberta Institute of Urology, Calgary, Canada
| | - Jenna Kimberly-Duffell
- University of Cambridge and Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - Daan Nieboer
- Erasmus Medical Center, Rotterdam, The Netherlands
| | | | - Tiziana Rancati
- Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, Milan, Italy
| | | | | | | | | | | | | | - Kwang Suk Lee
- Yonsei University College of Medicine, Gangnam Severance Hospital, Seoul, Korea
| | | | | | | | | | | | | | | | - Mark Buzza
- Movember Foundation, Melbourne, Australia
| | - Chris Bangma
- Erasmus Medical Center, Rotterdam, The Netherlands
| | | | | | | |
Collapse
|
17
|
Abstract
Prostate cancer (PCa) is the second most common cancer in men, and the second leading cause of death from cancer in men. Many studies on PCa have been carried out, each taking much time before the data is collected and ready to be analyzed. However, on the internet there is already a wide range of PCa datasets available, which could be used for data mining, predictive modelling or other purposes, reducing the need to setup new studies to collect data. In the current scientific climate, moving more and more to the analysis of "big data" and large, international, multi-site projects using a modern IT infrastructure, these datasets could be proven extremely valuable. This review presents an overview of publicly available patient-centered PCa datasets, divided into three categories (clinical, genomics and imaging) and an "overall" section to enable researchers to select a suitable dataset for analysis, without having to go through days of work to find the right data. To acquire a list of human PCa databases, scientific literature databases and academic social network sites were searched. We also used the information from other reviews. All databases in the combined list were then checked for public availability. Only databases that were either directly publicly available or available after signing a research data agreement or retrieving a free login were selected for inclusion in this review. Data should be available to commercial parties as well. This paper focuses on patient-centered data, so the genomics data section does not include gene-centered databases or pathway-centered databases. We identified 42 publicly available, patient-centered PCa datasets. Some of these consist of different smaller datasets. Some of them contain combinations of datasets from the three data domains: clinical data, imaging data and genomics data. Only one dataset contains information from all three domains. This review presents all datasets and their characteristics: number of subjects, clinical fields, imaging modalities, expression data, mutation data, biomarker measurements, etc. Despite all the attention that has been given to making this overview of publicly available databases as extensive as possible, it is very likely not complete, and will also be outdated soon. However, this review might help many PCa researchers to find suitable datasets to answer the research question with, without the need to start a new data collection project. In the coming era of big data analysis, overviews like this are becoming more and more useful.
Collapse
Affiliation(s)
- Tim Hulsen
- Department of Professional Health Solutions & Services, Philips Research, Eindhoven, The Netherlands
| |
Collapse
|
18
|
Hulsen T, Jamuar SS, Moody AR, Karnes JH, Varga O, Hedensted S, Spreafico R, Hafler DA, McKinney EF. From Big Data to Precision Medicine. Front Med (Lausanne) 2019; 6:34. [PMID: 30881956 PMCID: PMC6405506 DOI: 10.3389/fmed.2019.00034] [Citation(s) in RCA: 166] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Accepted: 02/04/2019] [Indexed: 02/05/2023] Open
Abstract
For over a decade the term "Big data" has been used to describe the rapid increase in volume, variety and velocity of information available, not just in medical research but in almost every aspect of our lives. As scientists, we now have the capacity to rapidly generate, store and analyse data that, only a few years ago, would have taken many years to compile. However, "Big data" no longer means what it once did. The term has expanded and now refers not to just large data volume, but to our increasing ability to analyse and interpret those data. Tautologies such as "data analytics" and "data science" have emerged to describe approaches to the volume of available information as it grows ever larger. New methods dedicated to improving data collection, storage, cleaning, processing and interpretation continue to be developed, although not always by, or for, medical researchers. Exploiting new tools to extract meaning from large volume information has the potential to drive real change in clinical practice, from personalized therapy and intelligent drug design to population screening and electronic health record mining. As ever, where new technology promises "Big Advances," significant challenges remain. Here we discuss both the opportunities and challenges posed to biomedical research by our increasing ability to tackle large datasets. Important challenges include the need for standardization of data content, format, and clinical definitions, a heightened need for collaborative networks with sharing of both data and expertise and, perhaps most importantly, a need to reconsider how and when analytic methodology is taught to medical researchers. We also set "Big data" analytics in context: recent advances may appear to promise a revolution, sweeping away conventional approaches to medical science. However, their real promise lies in their synergy with, not replacement of, classical hypothesis-driven methods. The generation of novel, data-driven hypotheses based on interpretable models will always require stringent validation and experimental testing. Thus, hypothesis-generating research founded on large datasets adds to, rather than replaces, traditional hypothesis driven science. Each can benefit from the other and it is through using both that we can improve clinical practice.
Collapse
Affiliation(s)
- Tim Hulsen
- Department of Professional Health Solutions and Services, Philips Research, Eindhoven, Netherlands
- *Correspondence: Tim Hulsen
| | - Saumya S. Jamuar
- Department of Paediatrics, KK Women's and Children's Hospital, and Paediatric Academic Clinical Programme, Duke-NUS Medical School, Singapore, Singapore
| | - Alan R. Moody
- Department of Medical Imaging, University of Toronto, Toronto, ON, Canada
| | - Jason H. Karnes
- Pharmacy Practice and Science, College of Pharmacy, University of Arizona Health Sciences, Phoenix, AZ, United States
| | - Orsolya Varga
- Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
| | - Stine Hedensted
- Department of Molecular Medicine, Aarhus University Hospital, Aarhus, Denmark
| | | | - David A. Hafler
- Departments of Neurology and Immunobiology, Yale School of Medicine, New Haven, CT, United States
| | - Eoin F. McKinney
- Department of Medicine, University of Cambridge School of Clinical Medicine, Cambridge, United Kingdom
- Eoin F. McKinney
| |
Collapse
|
19
|
Hulsen T, Obbink H, Van Der Linden W, De Jonge C, Nieboer D, Bruinsma S, Roobol M, Bangma C. 958 Integrating large datasets for the Movember Global Action Plan on active surveillance for low risk prostate cancer. ACTA ACUST UNITED AC 2016. [DOI: 10.1016/s1569-9056(16)60959-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
20
|
Fleuren WWM, Toonen EJM, Verhoeven S, Frijters R, Hulsen T, Rullmann T, van Schaik R, de Vlieg J, Alkema W. Identification of new biomarker candidates for glucocorticoid induced insulin resistance using literature mining. BioData Min 2013; 6:2. [PMID: 23379763 PMCID: PMC3577498 DOI: 10.1186/1756-0381-6-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2012] [Accepted: 01/02/2013] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Glucocorticoids are potent anti-inflammatory agents used for the treatment of diseases such as rheumatoid arthritis, asthma, inflammatory bowel disease and psoriasis. Unfortunately, usage is limited because of metabolic side-effects, e.g. insulin resistance, glucose intolerance and diabetes. To gain more insight into the mechanisms behind glucocorticoid induced insulin resistance, it is important to understand which genes play a role in the development of insulin resistance and which genes are affected by glucocorticoids.Medline abstracts contain many studies about insulin resistance and the molecular effects of glucocorticoids and thus are a good resource to study these effects. RESULTS We developed CoPubGene a method to automatically identify gene-disease associations in Medline abstracts. We used this method to create a literature network of genes related to insulin resistance and to evaluate the importance of the genes in this network for glucocorticoid induced metabolic side effects and anti-inflammatory processes.With this approach we found several genes that already are considered markers of GC induced IR, such as phosphoenolpyruvate carboxykinase (PCK) and glucose-6-phosphatase, catalytic subunit (G6PC). In addition, we found genes involved in steroid synthesis that have not yet been recognized as mediators of GC induced IR. CONCLUSIONS With this approach we are able to construct a robust informative literature network of insulin resistance related genes that gave new insights to better understand the mechanisms behind GC induced IR. The method has been set up in a generic way so it can be applied to a wide variety of disease networks.
Collapse
Affiliation(s)
- Wilco WM Fleuren
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Netherlands Bioinformatics Centre (NBIC), P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - Erik JM Toonen
- Department of Medicine, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | | - Raoul Frijters
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Present address: Rijk Zwaan Nederland BV, Fijnaart, The Netherlands
| | - Tim Hulsen
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Present address: Philips Research Europe, Eindhoven, The Netherlands
| | | | | | - Jacob de Vlieg
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Netherlands eScience Center, Amsterdam, The Netherlands
| | - Wynand Alkema
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands
- Present address: NIZO Food Research BV, Ede, The Netherlands
| |
Collapse
|
21
|
van Hooff SR, Koster J, Hulsen T, van Schaik BDC, Roos M, van Batenburg MF, Versteeg R, van Kampen AHC. The construction of genome-based transcriptional units. OMICS 2009; 13:105-14. [PMID: 19320556 DOI: 10.1089/omi.2008.0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Gene-oriented sequence clusters (transcriptional units) have found many applications in genomics research including the construction of transcriptome maps and identification of splice variants. We developed a new method to construct transcriptional that uses the genomic sequence as a template. We present and discuss our method in detail together with an evaluation of the transcriptional units for human. We constructed 33,007 and 27,792 transcriptional units for human and mouse, respectively. The sensitivity (81%) and specificity (90%) of our method compares favorably to other established methods. We evaluated the representation of experimentally validated and predicted intergenic spliced transcripts in humans and show that we correctly represent a large fraction of these cases by single transcriptional units. Our method performs well, but the evaluation of the final set of transcriptional units show that improvements to the algorithm are still possible. However, because the precise number and types of errors are difficult to track, it is not obvious how to significantly improve the algorithm. We believe that ongoing research efforts are necessary to further improve current methods. This should include detailed documentation, comparison, and evaluation of current methods.
Collapse
Affiliation(s)
- Sander R van Hooff
- Bioinformatics Laboratory, Academic Medical Center, Meibergdreef 9, Amsterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
Phylogenetic patterns show the presence or absence of certain genes in a set of full genomes derived from different species. They can also be used to determine sets of genes that occur only in certain evolutionary branches. Previously, we presented a database named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. Here, we describe an updated version of PhyloPat which can be queried by an improved web server. We used a single linkage clustering algorithm to create 241,697 phylogenetic lineages, using all the orthologies provided by Ensembl v49. PhyloPat offers the possibility of querying with binary phylogenetic patterns or regular expressions, or through a phylogenetic tree of the 39 included species. Users can also input a list of Ensembl, EMBL, EntrezGene or HGNC IDs to check which phylogenetic lineage any gene belongs to. A link to the FatiGO web interface has been incorporated in the HTML output. For each gene, the surrounding genes on the chromosome, color coded according to their phylogenetic lineage can be viewed, as well as FASTA files of the peptide sequences of each lineage. Furthermore, lists of omnipresent, polypresent, oligopresent and anticorrelating genes have been included. PhyloPat is freely available at http://www.cmbi.ru.nl/phylopat.
Collapse
Affiliation(s)
- Tim Hulsen
- Computational Drug Discovery, CMBI, NCMLS, Radboud University Nijmegen Medical Centre, PO Box 9101, 6500 HB Nijmegen, The Netherlands.
| | | | | | | |
Collapse
|
23
|
Hulsen T, de Vlieg J, Alkema W. BioVenn - a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics 2008. [PMID: 18925949 DOI: 10.1186/1471‐2164‐9‐488] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In many genomics projects, numerous lists containing biological identifiers are produced. Often it is useful to see the overlap between different lists, enabling researchers to quickly observe similarities and differences between the data sets they are analyzing. One of the most popular methods to visualize the overlap and differences between data sets is the Venn diagram: a diagram consisting of two or more circles in which each circle corresponds to a data set, and the overlap between the circles corresponds to the overlap between the data sets. Venn diagrams are especially useful when they are 'area-proportional' i.e. the sizes of the circles and the overlaps correspond to the sizes of the data sets. Currently there are no programs available that can create area-proportional Venn diagrams connected to a wide range of biological databases. RESULTS We designed a web application named BioVenn to summarize the overlap between two or three lists of identifiers, using area-proportional Venn diagrams. The user only needs to input these lists of identifiers in the textboxes and push the submit button. Parameters like colors and text size can be adjusted easily through the web interface. The position of the text can be adjusted by 'drag-and-drop' principle. The output Venn diagram can be shown as an SVG or PNG image embedded in the web application, or as a standalone SVG or PNG image. The latter option is useful for batch queries. Besides the Venn diagram, BioVenn outputs lists of identifiers for each of the resulting subsets. If an identifier is recognized as belonging to one of the supported biological databases, the output is linked to that database. Finally, BioVenn can map Affymetrix and EntrezGene identifiers to Ensembl genes. CONCLUSION BioVenn is an easy-to-use web application to generate area-proportional Venn diagrams from lists of biological identifiers. It supports a wide range of identifiers from the most used biological databases currently available. Its implementation on the World Wide Web makes it available for use on any computer with internet connection, independent of operating system and without the need to install programs locally. BioVenn is freely accessible at http://www.cmbi.ru.nl/cdd/biovenn/.
Collapse
Affiliation(s)
- Tim Hulsen
- Computational Drug Discovery, CMBI, NCMLS, Radboud University Nijmegen Medical Centre, PO Box 9101, 6500 HB Nijmegen, The Netherlands.
| | | | | |
Collapse
|
24
|
Hulsen T, de Vlieg J, Alkema W. BioVenn - a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics 2008; 9:488. [PMID: 18925949 PMCID: PMC2584113 DOI: 10.1186/1471-2164-9-488] [Citation(s) in RCA: 1064] [Impact Index Per Article: 66.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Accepted: 10/16/2008] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND In many genomics projects, numerous lists containing biological identifiers are produced. Often it is useful to see the overlap between different lists, enabling researchers to quickly observe similarities and differences between the data sets they are analyzing. One of the most popular methods to visualize the overlap and differences between data sets is the Venn diagram: a diagram consisting of two or more circles in which each circle corresponds to a data set, and the overlap between the circles corresponds to the overlap between the data sets. Venn diagrams are especially useful when they are 'area-proportional' i.e. the sizes of the circles and the overlaps correspond to the sizes of the data sets. Currently there are no programs available that can create area-proportional Venn diagrams connected to a wide range of biological databases. RESULTS We designed a web application named BioVenn to summarize the overlap between two or three lists of identifiers, using area-proportional Venn diagrams. The user only needs to input these lists of identifiers in the textboxes and push the submit button. Parameters like colors and text size can be adjusted easily through the web interface. The position of the text can be adjusted by 'drag-and-drop' principle. The output Venn diagram can be shown as an SVG or PNG image embedded in the web application, or as a standalone SVG or PNG image. The latter option is useful for batch queries. Besides the Venn diagram, BioVenn outputs lists of identifiers for each of the resulting subsets. If an identifier is recognized as belonging to one of the supported biological databases, the output is linked to that database. Finally, BioVenn can map Affymetrix and EntrezGene identifiers to Ensembl genes. CONCLUSION BioVenn is an easy-to-use web application to generate area-proportional Venn diagrams from lists of biological identifiers. It supports a wide range of identifiers from the most used biological databases currently available. Its implementation on the World Wide Web makes it available for use on any computer with internet connection, independent of operating system and without the need to install programs locally. BioVenn is freely accessible at http://www.cmbi.ru.nl/cdd/biovenn/.
Collapse
Affiliation(s)
- Tim Hulsen
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Jacob de Vlieg
- Computational Drug Discovery (CDD), CMBI, NCMLS, Radboud University Nijmegen Medical Centre, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands
- Molecular Design and Informatics, Schering-Plough, P.O. Box 20, 5340 BH Oss, The Netherlands
| | - Wynand Alkema
- Molecular Design and Informatics, Schering-Plough, P.O. Box 20, 5340 BH Oss, The Netherlands
| |
Collapse
|
25
|
Abstract
The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of intergenic DNA (gene duos). We show here that a lack of head-to-tail (h2t) gene duos is an even more distinctive characteristic of mammalian genomes, with the platypus genome as the only exception. In nonmammalian vertebrate and in nonvertebrate genomes, the frequency of h2h, h2t, and tail-to-tail (t2t) gene duos is close to random. In tetrapod genomes, the h2t and t2t gene duos are more likely to be part of a larger gene cluster of closely spaced genes than h2h gene duos; in fish and urochordate genomes, the reverse is seen. In human and mouse tissues, the expression profiles of gene duos were skewed toward positive coexpression, irrespective of orientation. The organization of orthologs of both members of about 40% of the human gene duos could be traced in other species, enabling a prediction of the organization at the branch points of gnathostomes, tetrapods, amniotes, and euarchontoglires. The accumulation of h2h gene duos started in tetrapods, whereas that of h2t and t2t gene duos only started in amniotes. The apparent lack of evolutionary conservation of h2t and t2t gene duos relative to that of h2h gene duos is thus a result of their relatively late origin in the lineage leading to mammals; we show that once they are formed h2t and t2t gene duos are as stable as h2h gene duos.
Collapse
Affiliation(s)
- Erik Franck
- Biomolecular Chemistry, 271 Nijmegen Center of Molecular Life Science, Radboud University Nijmegen, Nijmegen, The Netherlands
| | | | | | | | | | | |
Collapse
|
26
|
Denissov S, van Driel M, Voit R, Hekkelman M, Hulsen T, Hernandez N, Grummt I, Wehrens R, Stunnenberg H. Identification of novel functional TBP-binding sites and general factor repertoires. EMBO J 2007; 26:944-54. [PMID: 17268553 PMCID: PMC1852848 DOI: 10.1038/sj.emboj.7601550] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2006] [Accepted: 12/15/2006] [Indexed: 02/08/2023] Open
Abstract
Our current knowledge of the general factor requirement in transcription by the three mammalian RNA polymerases is based on a small number of model promoters. Here, we present a comprehensive chromatin immunoprecipitation (ChIP)-on-chip analysis for 28 transcription factors on a large set of known and novel TATA-binding protein (TBP)-binding sites experimentally identified via ChIP cloning. A large fraction of identified TBP-binding sites is located in introns or lacks a gene/mRNA annotation and is found to direct transcription. Integrated analysis of the ChIP-on-chip data and functional studies revealed that TAF12 hitherto regarded as RNA polymerase II (RNAP II)-specific was found to be also involved in RNAP I transcription. Distinct profiles for general transcription factors and TAF-containing complexes were uncovered for RNAP II promoters located in CpG and non-CpG islands suggesting distinct transcription initiation pathways. Our study broadens the spectrum of general transcription factor function and uncovers a plethora of novel, functional TBP-binding sites in the human genome.
Collapse
Affiliation(s)
- Sergey Denissov
- Department of Molecular Biology, Nijmegen Centre for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
| | - Marc van Driel
- Department of Molecular Biology, Nijmegen Centre for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University, Nijmegen, The Netherlands
| | - Renate Voit
- Division of Molecular Biology of the Cell II, German Cancer Research Center, Heidelberg, Germany
| | - Maarten Hekkelman
- Centre for Molecular and Biomolecular Informatics, Radboud University, Nijmegen, The Netherlands
| | - Tim Hulsen
- Centre for Molecular and Biomolecular Informatics, Radboud University, Nijmegen, The Netherlands
| | - Nouria Hernandez
- Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ingrid Grummt
- Division of Molecular Biology of the Cell II, German Cancer Research Center, Heidelberg, Germany
| | - Ron Wehrens
- Institute for Molecules and Materials, Radboud University, Nijmegen, The Netherlands
| | - Hendrik Stunnenberg
- Department of Molecular Biology, Nijmegen Centre for Molecular Life Sciences, Radboud University, Nijmegen, The Netherlands
- Department of Molecular Biology, Nijmegen Centre for Molecular Life Sciences (274), Radboud University, PO Box 9101 6500, HB Nijmegen, The Netherlands. Tel.: +31 24 3610524; Fax: +31 24 3610520; E-mail:
| |
Collapse
|
27
|
Hulsen T, de Vlieg J, Leunissen JAM, Groenen PMA. Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics 2006; 7:444. [PMID: 17038163 PMCID: PMC1618413 DOI: 10.1186/1471-2105-7-444] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2006] [Accepted: 10/12/2006] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.
Collapse
Affiliation(s)
- Tim Hulsen
- Centre for Molecular and Biomolecular Informatics (CMBI), Nijmegen Centre for Molecular Life Sciences (NCMLS), Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Jacob de Vlieg
- Centre for Molecular and Biomolecular Informatics (CMBI), Nijmegen Centre for Molecular Life Sciences (NCMLS), Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
- Molecular Design and Informatics, NV Organon, Oss, The Netherlands
| | - Jack AM Leunissen
- Laboratory of Bioinformatics, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Peter MA Groenen
- Molecular Design and Informatics, NV Organon, Oss, The Netherlands
| |
Collapse
|
28
|
Abstract
BACKGROUND Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. DESCRIPTION PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included. CONCLUSION PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic lineages will further increase with the addition of newly found orthologous relationships within each new Ensembl release.
Collapse
Affiliation(s)
- Tim Hulsen
- Centre for Molecular and Biomolecular Informatics (CMBI), Nijmegen Centre for Molecular Life Sciences (NCMLS), Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Jacob de Vlieg
- Centre for Molecular and Biomolecular Informatics (CMBI), Nijmegen Centre for Molecular Life Sciences (NCMLS), Radboud University Nijmegen, Nijmegen, The Netherlands
- Molecular Design and Informatics, NV Organon, Oss, The Netherlands
| | - Peter MA Groenen
- Molecular Design and Informatics, NV Organon, Oss, The Netherlands
| |
Collapse
|
29
|
Hulsen T, Huynen MA, de Vlieg J, Groenen PMA. Benchmarking ortholog identification methods using functional genomics data. Genome Biol 2006; 7:R31. [PMID: 16613613 PMCID: PMC1557999 DOI: 10.1186/gb-2006-7-4-r31] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2005] [Revised: 12/06/2005] [Accepted: 03/14/2006] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods. RESULTS To measure the similarity in function of proteins from different species we used functional genomics data, such as expression data and protein interaction data. We tested several of the most popular ortholog identification methods. In general, we observed a sensitivity/selectivity trade-off: the functional similarity scores per orthologous pair of sequences become higher when the number of proteins included in the ortholog groups decreases. CONCLUSION By combining the sensitivity and the selectivity into an overall score, we show that the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins.
Collapse
Affiliation(s)
- Tim Hulsen
- Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen, Toernooiveld 1, Nijmegen, 6500 GL, The Netherlands.
| | | | | | | |
Collapse
|
30
|
Abstract
Many G protein-coupled receptor (GPCR) models have been built over the years. The release of the structure of bovine rhodopsin in August 2000 enabled us to analyze models built before that period to learn more about the models we build today. We conclude that the GPCR modelling field is riddled with 'common knowledge' similar to Lord Kelvin's remark in 1895 that "heavier-than-air flying machines are impossible", and we summarize what we think are the (im)possibilities of modelling GPCRs using the coordinates of bovine rhodopsin as a template. Associated WWW pages: www.gpcr.org/articles/2003_mod
Collapse
Affiliation(s)
- L Oliveira
- Escola Paulista de Medicina, Sao Paulo, Brazil
| | | | | | | | | |
Collapse
|
31
|
Nong AT, Andress D, Hulsen T, Seeler RA, Bowers BJ, Peters HA, Leon AS, Conomy JP, Heiman MF, Barnes RW, Upton J. The Medical Bookshelf. Postgrad Med 1980. [DOI: 10.1080/00325481.1980.11715389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|