1
|
Horenko I, Pospíšil L, Vecchi E, Albrecht S, Gerber A, Rehbock B, Stroh A, Gerber S. Low-Cost Probabilistic 3D Denoising with Applications for Ultra-Low-Radiation Computed Tomography. J Imaging 2022; 8:jimaging8060156. [PMID: 35735955 PMCID: PMC9224620 DOI: 10.3390/jimaging8060156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 12/04/2022] Open
Abstract
We propose a pipeline for synthetic generation of personalized Computer Tomography (CT) images, with a radiation exposure evaluation and a lifetime attributable risk (LAR) assessment. We perform a patient-specific performance evaluation for a broad range of denoising algorithms (including the most popular deep learning denoising approaches, wavelets-based methods, methods based on Mumford−Shah denoising, etc.), focusing both on accessing the capability to reduce the patient-specific CT-induced LAR and on computational cost scalability. We introduce a parallel Probabilistic Mumford−Shah denoising model (PMS) and show that it markedly-outperforms the compared common denoising methods in denoising quality and cost scaling. In particular, we show that it allows an approximately 22-fold robust patient-specific LAR reduction for infants and a 10-fold LAR reduction for adults. Using a normal laptop, the proposed algorithm for PMS allows cheap and robust (with a multiscale structural similarity index >90%) denoising of very large 2D videos and 3D images (with over 107 voxels) that are subject to ultra-strong noise (Gaussian and non-Gaussian) for signal-to-noise ratios far below 1.0. The code is provided for open access.
Collapse
Affiliation(s)
- Illia Horenko
- Faculty of Mathematics, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
- Correspondence: (I.H.); (S.G.)
| | - Lukáš Pospíšil
- Department of Mathematics, VSB Ostrava, Ludvika Podeste 1875/17, 708 33 Ostrava, Czech Republic;
| | - Edoardo Vecchi
- Institute of Computing, Faculty of Informatics, Universitá della Svizzera Italiana (USI), 6962 Viganello, Switzerland;
| | - Steffen Albrecht
- Institute of Physiology, University Medical Center of the Johannes Gutenberg—University Mainz, 55128 Mainz, Germany;
| | - Alexander Gerber
- Institute of Occupational Medicine, Faculty of Medicine, GU Frankfurt, 60590 Frankfurt am Main, Germany;
| | - Beate Rehbock
- Lung Radiology Center Berlin, 10627 Berlin, Germany;
| | - Albrecht Stroh
- Institute of Pathophysiology, University Medical Center of the Johannes Gutenberg—University Mainz, 55128 Mainz, Germany;
| | - Susanne Gerber
- Institute for Human Genetics, University Medical Center of the Johannes Gutenberg—University Mainz, 55128 Mainz, Germany
- Correspondence: (I.H.); (S.G.)
| |
Collapse
|
2
|
Gerber S, Pospisil L, Sys S, Hewel C, Torkamani A, Horenko I. Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics. Front Artif Intell 2022; 4:739432. [PMID: 35072059 PMCID: PMC8766632 DOI: 10.3389/frai.2021.739432] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 11/19/2021] [Indexed: 11/13/2022] Open
Abstract
Mislabeling of cases as well as controls in case–control studies is a frequent source of strong bias in prognostic and diagnostic tests and algorithms. Common data processing methods available to the researchers in the biomedical community do not allow for consistent and robust treatment of labeled data in the situations where both, the case and the control groups, contain a non-negligible proportion of mislabeled data instances. This is an especially prominent issue in studies regarding late-onset conditions, where individuals who may convert to cases may populate the control group, and for screening studies that often have high false-positive/-negative rates. To address this problem, we propose a method for a simultaneous robust inference of Lasso reduced discriminative models and of latent group-specific mislabeling risks, not requiring any exactly labeled data. We apply it to a standard breast cancer imaging dataset and infer the mislabeling probabilities (being rates of false-negative and false-positive core-needle biopsies) together with a small set of simple diagnostic rules, outperforming the state-of-the-art BI-RADS diagnostics on these data. The inferred mislabeling rates for breast cancer biopsies agree with the published purely empirical studies. Applying the method to human genomic data from a healthy-ageing cohort reveals a previously unreported compact combination of single-nucleotide polymorphisms that are strongly associated with a healthy-ageing phenotype for Caucasians. It determines that 7.5% of Caucasians in the 1000 Genomes dataset (selected as a control group) carry a pattern characteristic of healthy ageing.
Collapse
Affiliation(s)
- Susanne Gerber
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- *Correspondence: Susanne Gerber, ; Illia Horenko,
| | - Lukas Pospisil
- Faculty of Informatics, Institute of Computational Science, Università Della Svizzera Italiana, Lugano, Switzerland
| | - Stanislav Sys
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Charlotte Hewel
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Ali Torkamani
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, United States
| | - Illia Horenko
- Faculty of Informatics, Institute of Computational Science, Università Della Svizzera Italiana, Lugano, Switzerland
- *Correspondence: Susanne Gerber, ; Illia Horenko,
| |
Collapse
|
3
|
Rodrigues DR, Everschor-Sitte K, Gerber S, Horenko I. A deeper look into natural sciences with physics-based and data-driven measures. iScience 2021; 24:102171. [PMID: 33665584 PMCID: PMC7907479 DOI: 10.1016/j.isci.2021.102171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
With the development of machine learning in recent years, it is possible to glean much more information from an experimental data set to study matter. In this perspective, we discuss some state-of-the-art data-driven tools to analyze latent effects in data and explain their applicability in natural science, focusing on two recently introduced, physics-motivated computationally cheap tools-latent entropy and latent dimension. We exemplify their capabilities by applying them on several examples in the natural sciences and show that they reveal so far unobserved features such as, for example, a gradient in a magnetic measurement and a latent network of glymphatic channels from the mouse brain microscopy data. What sets these techniques apart is the relaxation of restrictive assumptions typical of many machine learning models and instead incorporating aspects that best fit the dynamical systems at hand.
Collapse
Affiliation(s)
- Davi Röhe Rodrigues
- Institute of Physics, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | | | - Susanne Gerber
- Institute of Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, 55131 Mainz, Germany
| | - Illia Horenko
- Università della Svizzera Italiana, Faculty of Informatics, Via G. Buffi 13, 6900 Lugano, Switzerland
| |
Collapse
|
4
|
Gerber S, Pospisil L, Navandar M, Horenko I. Low-cost scalable discretization, prediction, and feature selection for complex systems. SCIENCE ADVANCES 2020; 6:eaaw0961. [PMID: 32064328 PMCID: PMC6989146 DOI: 10.1126/sciadv.aaw0961] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 11/22/2019] [Indexed: 06/10/2023]
Abstract
Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services).
Collapse
Affiliation(s)
- S. Gerber
- Center of Computational Sciences, Johannes-Gutenberg-University of Mainz, PhysMat/Staudingerweg 9, 55128 Mainz, Germany
| | - L. Pospisil
- Faculty of Informatics, Universita della Svizzera Italiana, Via G. Buffi 13, 6900 Lugano Switzerland
| | - M. Navandar
- Center of Computational Sciences, Johannes-Gutenberg-University of Mainz, PhysMat/Staudingerweg 9, 55128 Mainz, Germany
| | - I. Horenko
- Faculty of Informatics, Universita della Svizzera Italiana, Via G. Buffi 13, 6900 Lugano Switzerland
| |
Collapse
|
5
|
Pizzagalli DU, Gonzalez SF, Krause R. A trainable clustering algorithm based on shortest paths from density peaks. SCIENCE ADVANCES 2019; 5:eaax3770. [PMID: 32195334 PMCID: PMC7051829 DOI: 10.1126/sciadv.aax3770] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 09/14/2019] [Indexed: 06/10/2023]
Abstract
Clustering is a technique to analyze empirical data, with a major application for biomedical research. Essentially, clustering finds groups of related points in a dataset. However, results depend on both metrics for point-to-point similarity and rules for point-to-group association. Non-appropriate metrics and rules can lead to artifacts, especially in case of multiple groups with heterogeneous structure. In this work, we propose a clustering algorithm that evaluates the properties of paths between points (rather than point-to-point similarity) and solves a global optimization problem, finding solutions not obtainable by methods relying on local choices. Moreover, our algorithm is trainable. Hence, it can be adapted and adopted for specific datasets and applications by providing examples of valid and invalid paths to train a path classifier. We demonstrate its applicability to identify heterogeneous groups in challenging synthetic datasets, segment highly nonconvex immune cells in confocal microscopy images, and classify arrhythmic heartbeats in electrocardiographic signals.
Collapse
Affiliation(s)
- Diego Ulisse Pizzagalli
- Institute for Research in Biomedicine, Faculty of Biomedical Sciences, Università della Svizzera italiana, CH6500 Bellinzona, Switzerland
- Institute of Computational Science, Università della Svizzera italiana, CH6900 Lugano, Switzerland
| | | | - Rolf Krause
- Institute for Research in Biomedicine, Faculty of Biomedical Sciences, Università della Svizzera italiana, CH6500 Bellinzona, Switzerland
| |
Collapse
|
6
|
Tartari F, Conti A, Cerqueti R. Assessing the relationship between toxicity and economic cost of oncological target agents: A systematic review of clinical trials. PLoS One 2017; 12:e0183639. [PMID: 28829823 PMCID: PMC5567914 DOI: 10.1371/journal.pone.0183639] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 07/29/2017] [Indexed: 12/19/2022] Open
Abstract
Target agents are peculiar oncological drugs which differ from the traditional therapies in their ability of recognizing specific molecules expressed by tumor cells and microenvironment. Thus, their toxicity is generally lower than that associated to chemotherapy, and they represent nowadays a new standard of care in a number of tumors. This paper deals with the relationship between economic costs and toxicity of target agents. At this aim, a cluster analysis-based exploration of the main features of a large collection of them is carried out, with a specific focus on the variables leading to the identification of their toxicity and related costs. The analysis of the toxicity is based on the Severe Adverse Events (SAE) and Discontinuation (D) rates of each target agent considering data published on PubMed from 1965 to 2016 in the phase II and III studies that have led to the approval of these drugs for cancer patients by US Food and Drug Administration. The construction of the dataset represents a key step of the research, and is grounded on the critical analysis of a wide set of clinical studies. In order to capture different evaluation strategies of the toxicity, clustering is performed according to three different criteria (including Voronoi tessellation). Our procedure allows us to identify 5 different groups of target agents pooled by similar SAE and D rates and, at the same time, 3 groups based on target agents' costs for 1 month and for the median whole duration of therapy. Results highlight several specific regularities for toxicity and costs. This study present several limitations, being realized starting from clinical trials and not from individual patients' data. However, a macroscopic perspective suggests that costs are rather heterogeneous, and they do not clearly follow the clustering based on SAE and D rates.
Collapse
Affiliation(s)
- Francesca Tartari
- Department of Economics and Law, University of Macerata. Via Crescimbeni, Macerata, Italy
| | - Alessandro Conti
- Azienda Ospedaliera dell’Alto Adige, Bressanone/Brissen Hospital. Via Dante, Bressanone/Brissen, Italy
| | - Roy Cerqueti
- Department of Economics and Law, University of Macerata. Via Crescimbeni, Macerata, Italy
| |
Collapse
|