1
|
Gangavarapu K, Ji X, Baele G, Fourment M, Lemey P, Matsen FA, Suchard MA. Many-core algorithms for high-dimensional gradients on phylogenetic trees. Bioinformatics 2024; 40:btae030. [PMID: 38243701 PMCID: PMC10868298 DOI: 10.1093/bioinformatics/btae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 12/20/2023] [Accepted: 01/15/2024] [Indexed: 01/21/2024] Open
Abstract
MOTIVATION Advancements in high-throughput genomic sequencing are delivering genomic pathogen data at an unprecedented rate, positioning statistical phylogenetics as a critical tool to monitor infectious diseases globally. This rapid growth spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes O(N2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in O(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as Markov-modulated and codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. RESULTS We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples exploring complete genomes from 997 dengue viruses, 62 carnivore mitochondria and 49 yeasts, and observe a >128-fold speedup over the CPU implementation for codon-based models and >8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. AVAILABILITY AND IMPLEMENTATION We provide an implementation of our GPU algorithms in BEAGLE v4.0.0 (https://github.com/beagle-dev/beagle-lib), an open-source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (https://github.com/beast-dev/beast-mcmc).
Collapse
Affiliation(s)
- Karthik Gangavarapu
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, United States
| | - Xiang Ji
- Department of Mathematics, School of Science & Engineering, Tulane University, New Orleans, LA, United States
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Mathieu Fourment
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo, NSW, Australia
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Frederick A Matsen
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, United States
- Department of Statistics, University of Washington, Seattle, WA, United States
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
- Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, United States
| |
Collapse
|
2
|
Reska D, Kretowski M. GPU-accelerated lung CT segmentation based on level sets and texture analysis. Sci Rep 2024; 14:1444. [PMID: 38228773 PMCID: PMC10792028 DOI: 10.1038/s41598-024-51452-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 01/05/2024] [Indexed: 01/18/2024] Open
Abstract
This paper presents a novel semi-automatic method for lung segmentation in thoracic CT datasets. The fully three-dimensional algorithm is based on a level set representation of an active surface and integrates texture features to improve its robustness. The method's performance is enhanced by the graphics processing unit (GPU) acceleration. The segmentation process starts with a manual initialisation of 2D contours on a few representative slices of the analysed volume. Next, the starting regions for the active surface are generated according to the probability maps of texture features. The active surface is then evolved to give the final segmentation result. The recent implementation employs features based on grey-level co-occurrence matrices and Gabor filters. The algorithm was evaluated on real medical imaging data from the LCTCS 2017 challenge. The results were also compared with the outcomes of other segmentation methods. The proposed approach provided high segmentation accuracy while offering very competitive performance.
Collapse
Affiliation(s)
- Daniel Reska
- Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland.
| | - Marek Kretowski
- Faculty of Computer Science, Bialystok University of Technology, Białystok, Poland
| |
Collapse
|
3
|
Laine RF, Heil HS, Coelho S, Nixon-Abell J, Jimenez A, Wiesner T, Martínez D, Galgani T, Régnier L, Stubb A, Follain G, Webster S, Goyette J, Dauphin A, Salles A, Culley S, Jacquemet G, Hajj B, Leterrier C, Henriques R. High-fidelity 3D live-cell nanoscopy through data-driven enhanced super-resolution radial fluctuation. Nat Methods 2023; 20:1949-1956. [PMID: 37957430 PMCID: PMC10703683 DOI: 10.1038/s41592-023-02057-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 09/29/2023] [Indexed: 11/15/2023]
Abstract
Live-cell super-resolution microscopy enables the imaging of biological structure dynamics below the diffraction limit. Here we present enhanced super-resolution radial fluctuations (eSRRF), substantially improving image fidelity and resolution compared to the original SRRF method. eSRRF incorporates automated parameter optimization based on the data itself, giving insight into the trade-off between resolution and fidelity. We demonstrate eSRRF across a range of imaging modalities and biological systems. Notably, we extend eSRRF to three dimensions by combining it with multifocus microscopy. This realizes live-cell volumetric super-resolution imaging with an acquisition speed of ~1 volume per second. eSRRF provides an accessible super-resolution approach, maximizing information extraction across varied experimental conditions while minimizing artifacts. Its optimal parameter prediction strategy is generalizable, moving toward unbiased and optimized analyses in super-resolution microscopy.
Collapse
Affiliation(s)
- Romain F Laine
- Laboratory for Molecular Cell Biology, University College London, London, UK
- The Francis Crick Institute, London, UK
- Micrographia Bio, Translation and Innovation Hub, London, UK
| | - Hannah S Heil
- Optical Cell Biology, Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Simao Coelho
- Optical Cell Biology, Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Jonathon Nixon-Abell
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
- Cambridge Institute for Medical Research, Cambridge Univeristy, Cambridge, UK
| | - Angélique Jimenez
- Aix-Marseille Université, CNRS, INP UMR7051, NeuroCyto, Marseille, France
| | - Theresa Wiesner
- Aix-Marseille Université, CNRS, INP UMR7051, NeuroCyto, Marseille, France
| | - Damián Martínez
- Optical Cell Biology, Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Tommaso Galgani
- Laboratoire Physico-Chimie Curie, Institut Curie, PSL Research University, Sorbonne Université, CNRS UMR168, Paris, France
- Revvity Signals, Tres Cantos, Madrid, Spain
| | - Louise Régnier
- Laboratoire Physico-Chimie Curie, Institut Curie, PSL Research University, Sorbonne Université, CNRS UMR168, Paris, France
| | - Aki Stubb
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Department of Cell and Tissue Dynamics, Max Planck Institute for Molecular Biomedicine, Munster, Germany
| | - Gautier Follain
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Faculty of Science and Engineering, Cell Biology, Åbo Akademi University, Turku, Finland
| | - Samantha Webster
- EMBL Australia Node in Single Molecule Science, School of Biomedical Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Jesse Goyette
- EMBL Australia Node in Single Molecule Science, School of Biomedical Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Aurelien Dauphin
- Unite Genetique et Biologie du Développement U934, PICT-IBiSA, Institut Curie, INSERM, CNRS, PSL Research University, Paris, France
| | - Audrey Salles
- Institut Pasteur, Université Paris Cité, Unit of Technology and Service Photonic BioImaging (UTechS PBI), C2RT, Paris, France
| | - Siân Culley
- Laboratory for Molecular Cell Biology, University College London, London, UK
- Randall Centre for Cell and Molecular Biophysics, King's College London, Guy's Campus, London, UK
| | - Guillaume Jacquemet
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Faculty of Science and Engineering, Cell Biology, Åbo Akademi University, Turku, Finland
- Turku Bioimaging, University of Turku and Åbo Akademi University, Turku, Finland
- InFLAMES Research Flagship Center, Åbo Akademi University, Turku, Finland
| | - Bassam Hajj
- Laboratoire Physico-Chimie Curie, Institut Curie, PSL Research University, Sorbonne Université, CNRS UMR168, Paris, France.
| | | | - Ricardo Henriques
- Laboratory for Molecular Cell Biology, University College London, London, UK.
- The Francis Crick Institute, London, UK.
- Optical Cell Biology, Instituto Gulbenkian de Ciência, Oeiras, Portugal.
| |
Collapse
|
4
|
Di Felice R, Mayes ML, Richard RM, Williams-Young DB, Chan GKL, de Jong WA, Govind N, Head-Gordon M, Hermes MR, Kowalski K, Li X, Lischka H, Mueller KT, Mutlu E, Niklasson AMN, Pederson MR, Peng B, Shepard R, Valeev EF, van Schilfgaarde M, Vlaisavljevich B, Windus TL, Xantheas SS, Zhang X, Zimmerman PM. A Perspective on Sustainable Computational Chemistry Software Development and Integration. J Chem Theory Comput 2023; 19:7056-7076. [PMID: 37769271 PMCID: PMC10601486 DOI: 10.1021/acs.jctc.3c00419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Indexed: 09/30/2023]
Abstract
The power of quantum chemistry to predict the ground and excited state properties of complex chemical systems has driven the development of computational quantum chemistry software, integrating advances in theory, applied mathematics, and computer science. The emergence of new computational paradigms associated with exascale technologies also poses significant challenges that require a flexible forward strategy to take full advantage of existing and forthcoming computational resources. In this context, the sustainability and interoperability of computational chemistry software development are among the most pressing issues. In this perspective, we discuss software infrastructure needs and investments with an eye to fully utilize exascale resources and provide unique computational tools for next-generation science problems and scientific discoveries.
Collapse
Affiliation(s)
- Rosa Di Felice
- Departments
of Physics and Astronomy and Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, United States
- CNR-NANO
Modena, Modena 41125, Italy
| | - Maricris L. Mayes
- Department
of Chemistry and Biochemistry, University
of Massachusetts Dartmouth, North Dartmouth, Massachusetts 02747, United States
| | | | | | - Garnet Kin-Lic Chan
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Wibe A. de Jong
- Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Niranjan Govind
- Physical
Sciences Division, Pacific Northwest National
Laboratory, Richland, Washington 99354, United States
| | - Martin Head-Gordon
- Pitzer Center
for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Matthew R. Hermes
- Department
of Chemistry, Chicago Center for Theoretical Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | - Karol Kowalski
- Physical
Sciences Division, Pacific Northwest National
Laboratory, Richland, Washington 99354, United States
| | - Xiaosong Li
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Hans Lischka
- Department
of Chemistry and Biochemistry, Texas Tech
University, Lubbock, Texas 79409, United States
| | - Karl T. Mueller
- Physical
and Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Erdal Mutlu
- Advanced
Computing, Mathematics, and Data Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Anders M. N. Niklasson
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Mark R. Pederson
- Department
of Physics, The University of Texas at El
Paso, El Paso, Texas 79968, United States
| | - Bo Peng
- Physical
Sciences Division, Pacific Northwest National
Laboratory, Richland, Washington 99354, United States
| | - Ron Shepard
- Chemical
Sciences and Engineering Division, Argonne
National Laboratory, Lemont, Illinois 60439, United States
| | - Edward F. Valeev
- Department
of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States
| | | | - Bess Vlaisavljevich
- Department
of Chemistry, University of South Dakota, Vermillion, South Dakota 57069, United States
| | - Theresa L. Windus
- Department
of Chemistry, Iowa State University and
Ames Laboratory, Ames, Iowa 50011, United States
| | - Sotiris S. Xantheas
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
- Advanced
Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Xing Zhang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Paul M. Zimmerman
- Department
of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
5
|
Haggui N, Hamidouche W, Belghith F, Masmoudi N, Nezan JF. OpenVVC Decoder Parameterized and Interfaced Synchronous Dataflow (PiSDF) Model: Tile Based Parallelism. JOURNAL OF SIGNAL PROCESSING SYSTEMS 2022; 95:1-13. [PMID: 36268535 PMCID: PMC9569024 DOI: 10.1007/s11265-022-01819-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 09/09/2022] [Accepted: 10/03/2022] [Indexed: 06/16/2023]
Abstract
The emergence of the new video coding standard, Versatile Video Coding (VVC), has resulted in a 40-50% coding gain over its predecessor HEVC for the same visual quality. However, this is accompanied by a sharp increase in computational complexity. The emergence of the VVC standard and the increase in video resolution have exceeded the capacity of single-core architectures. This fact has led researchers to use multicore architectures for the implementation of video standards and to use the parallelism of these architectures for real-time applications. With the strong growth in both areas, video coding and multicore architecture, there is a great need for a design methodology that facilitates the exploration of heterogeneous multicore architectures, which automatically generates optimized code for these architectures in order to reduce time to market. In this context, this paper aims to use the methodology based on data flow modeling associated with the PREESM software. This paper shows how the software has been used to model a complete standard VVC video decoder using Parameterized and Interfaced Synchronous Dataflow (PiSDF) model. The proposed model takes advantage of the parallelism strategies of the OpenVVC decoder and in particular the tile-based parallelism. Experimental results show that the speed of the VVC decoder in PiSDF is slightly higher than the OpenVVC decoder handwritten in C/C++ languages, by up to 11% speedup on a 24-core processor. Thus, the proposed decoder outperforms the state-of-the-art dataflow decoders based on the RVC-CAL model.
Collapse
Affiliation(s)
- Naouel Haggui
- Univ Rennes, INSA Rennes, CNRS, IETR - UMR 6164, Rennes, 20 Avenue des Buttes de Coesmes, Rennes, 35700 France
- Electronics and Information Technology Laboratory (LETI) of Sfax, Road of Soukra, Sfax, 3038 Tunisia
| | - Wassim Hamidouche
- Univ Rennes, INSA Rennes, CNRS, IETR - UMR 6164, Rennes, 20 Avenue des Buttes de Coesmes, Rennes, 35700 France
| | - Fatma Belghith
- Electronics and Information Technology Laboratory (LETI) of Sfax, Road of Soukra, Sfax, 3038 Tunisia
| | - Nouri Masmoudi
- Electronics and Information Technology Laboratory (LETI) of Sfax, Road of Soukra, Sfax, 3038 Tunisia
| | - Jean-François Nezan
- Univ Rennes, INSA Rennes, CNRS, IETR - UMR 6164, Rennes, 20 Avenue des Buttes de Coesmes, Rennes, 35700 France
| |
Collapse
|
6
|
Ayub M, Helmy T. Concurrent kernel execution and interference analysis on GPUs using deep learning approaches. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2022.10.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
7
|
Wang Z, Peyton BG, Crawford TD. Accelerating Real-Time Coupled Cluster Methods with Single-Precision Arithmetic and Adaptive Numerical Integration. J Chem Theory Comput 2022; 18:5479-5491. [PMID: 35939815 DOI: 10.1021/acs.jctc.2c00490] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We explore the framework of a real-time coupled cluster method with a focus on improving its computational efficiency. Propagation of the wave function via the time-dependent Schrödinger equation places high demands on computing resources, particularly for high level theories such as coupled cluster with polynomial scaling. Similar to earlier investigations of coupled cluster properties, we demonstrate that the use of single-precision arithmetic reduces both the storage and multiplicative costs of the real-time simulation by approximately a factor of 2 with no significant impact on the resulting UV/vis absorption spectrum computed via the Fourier transform of the time-dependent dipole moment. Additional speedups─of up to a factor of 14 in test simulations of water clusters─are obtained via a straightforward GPU-based implementation as compared to conventional CPU calculations. We also find that further performance optimization is accessible through sagacious selection of numerical integration algorithms, and the adaptive methods, such as the Cash-Karp integrator, provide an effective balance between computing costs and numerical stability. Finally, we demonstrate that a simple mixed-step integrator based on the conventional fourth-order Runge-Kutta approach is capable of stable propagations even for strong external fields, provided the time step is appropriately adapted to the duration of the laser pulse with only minimal computational overhead.
Collapse
Affiliation(s)
- Zhe Wang
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Benjamin G Peyton
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - T Daniel Crawford
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
8
|
Morrical N, Wald I, Usher W, Pascucci V. Accelerating Unstructured Mesh Point Location With RT Cores. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2852-2866. [PMID: 33290224 DOI: 10.1109/tvcg.2020.3042930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We present a technique that leverages ray tracing hardware available in recent Nvidia RTX GPUs to solve a problem other than classical ray tracing. Specifically, we demonstrate how to use these units to accelerate the point location of general unstructured elements consisting of both planar and bilinear faces. This unstructured mesh point location problem has previously been challenging to accelerate on GPU architectures; yet, the performance of these queries is crucial to many unstructured volume rendering and compute applications. Starting with a CUDA reference method, we describe and evaluate three approaches that reformulate these point queries to incrementally map algorithmic complexity to these new hardware ray tracing units. Each variant replaces the simpler problem of point queries with a more complex one of ray queries. Initial variants exploit ray tracing cores for accelerated BVH traversal, and subsequent variants use ray-triangle intersections and per-face metadata to detect point-in-element intersections. Although these later variants are more algorithmically complex, they are significantly faster than the reference method thanks to hardware acceleration. Using our approach, we improve the performance of an unstructured volume renderer by up to 4× for tetrahedral meshes and up to 15× for general bilinear element meshes, matching, or out-performing state-of-the-art solutions while simultaneously improving on robustness and ease-of-implementation.
Collapse
|
9
|
Ahmed U, Lin JCW, Srivastava G. Heterogeneous Energy-aware Load Balancing for Industry 4.0 and IoT Environments. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS 2022. [DOI: 10.1145/3543859] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
With the improvement of global infrastructure, Cyber-Physical Systems (CPS) have become an important component of Industry 4.0. Both the application as well as the machine work together to improve the task of interdependencies. Machine learning methods in CPS require the monitoring of computational algorithms, including adopting optimizations, fine-tuning cyber systems, improving resource utilization, as well as reducing vulnerability as well as computation time. By leveraging the tremendous parallelism provided by General-Purpose Graphics Processing Units (GPGPU) as well as OpenCL, it is possible to dramatically reduce the execution time of data-parallel programs. However, when running an application with tiny amounts of data on a GPU, GPU resources are wasted because the program may not be able to fully utilize the GPU cores. This is because there is no mechanism for kernels to share a GPU due to the lack of OS support for GPUs. Optimal device selection is required to reduce the high power of the GPU. In this paper, we propose an energy reduction method for heterogeneous clustering. This study focuses on load balancing; resource-aware processor selection based on machine learning is performed using code features. The proposed method identifies energy-efficient kernel candidates (from the employment pool). Then, it selects a pair of kernel candidates from all possibilities that lead to a reduction in both energy consumption as well as execution time. Experimental results show that the proposed kernel approach reduces execution time by 2.23 times compared to a baseline scheduling system. Experiments have also shown that the execution time is 1.2 times faster than state-of-the-art approaches.
Collapse
Affiliation(s)
- Usman Ahmed
- Department of Computer Science, Electrical Engineering and Mathematical Sciences Western Norway University of Applied Sciences, Norway
| | - Jerry Chun-Wei Lin
- Department of Computer Science, Electrical Engineering and Mathematical Sciences Western Norway University of Applied Sciences, Norway
| | - Gautam Srivastava
- Department of Mathematics & Computer Science Brandon University, Canada and Research Centre for Interneural Computing China Medical University, Taiwan
| |
Collapse
|
10
|
Performance Evaluation of Massively Parallel Systems Using SPEC OMP Suite. COMPUTERS 2022. [DOI: 10.3390/computers11050075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Performance analysis plays an essential role in achieving a scalable performance of applications on massively parallel supercomputers equipped with thousands of processors. This paper is an empirical investigation to study, in depth, the performance of two of the most common High-Performance Computing architectures in the world. IBM has developed three generations of Blue Gene supercomputers—Blue Gene/L, P, and Q—that use, at a large scale, low-power processors to achieve high performance. Better CPU core efficiency has been empowered by a higher level of integration to gain more parallelism per processing element. On the other hand, the Intel Xeon Phi coprocessor armed with 61 on-chip x86 cores, provides high theoretical peak performance, as well as software development flexibility with existing high-level programming tools. We present an extensive evaluation study of the performance peaks and scalability of these two modern architectures using SPEC OMP benchmarks.
Collapse
|
11
|
Bimbo J, Morgan AS, Dollar AM. Force-Based Simultaneous Mapping and Object Reconstruction for Robotic Manipulation. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3152244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
12
|
Adaptive Computation Offloading with Task Scheduling Minimizing Reallocation in VANETs. ELECTRONICS 2022. [DOI: 10.3390/electronics11071106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computation Offloading (CO) can accelerate application running through parallel processing. This paper proposes a method called vehicular adaptive offloading (VAO) in which vehicles in vehicular ad-hoc networks (VANETs) offload computationally intensive tasks to nearby vehicles by taking into account the dynamic topology change of VANETs. In CO, task scheduling has a huge impact on overall performance. After representing the precedence relationship between tasks with a directed acyclic graph (DAG), VAO in the CO requesting vehicle assigns tasks to neighbors so that it can minimize the probability of task reallocations caused by connection disruption between vehicles in VANETs. The simulation results showed that the proposed method decreases the number of reallocations by 45.4%, as compared with the well-known task scheduling algorithms HEFT and Max-Min. Accordingly, the schedule length of entire tasks belonging to one application is shortened by 14.4% on average.
Collapse
|
13
|
|
14
|
|
15
|
A Review of Parallel Implementations for the Smith-Waterman Algorithm. Interdiscip Sci 2021; 14:1-14. [PMID: 34487327 PMCID: PMC8419822 DOI: 10.1007/s12539-021-00473-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 08/02/2021] [Accepted: 08/04/2021] [Indexed: 12/04/2022]
Abstract
Abstract The rapid advances in sequencing technology have led to an explosion of sequence data. Sequence alignment is the central and fundamental problem in many sequence analysis procedure, while local alignment is often the kernel of these algorithms. Usually, Smith–Waterman algorithm is used to find the best subsequence match between given sequences. However, the high time complexity makes the algorithm time-consuming. A lot of approaches have been developed to accelerate and parallelize it, such as vector-level parallelization, thread-level parallelization, process-level parallelization, and heterogeneous acceleration, but the current researches seem unsystematic, which hinders the further research of parallelizing the algorithm. In this paper, we summarize the current research status of parallel local alignments and describe the data layout in these work. Based on the research status, we emphasize large-scale genomic comparisons. By surveying some typical alignment tools’ performance, we discuss some possible directions in the future. We hope our work will provide the developers of the alignment tool with technical principle support, and help researchers choose proper alignment tools. Graphic abstract ![]()
Collapse
|
16
|
Fukaya S, Takemura M. Kinetic Analysis of Acanthamoeba castellanii Infected with Giant Viruses Quantitatively Revealed Process of Morphological and Behavioral Changes in Host Cells. Microbiol Spectr 2021; 9:e0036821. [PMID: 34431709 PMCID: PMC8552732 DOI: 10.1128/spectrum.00368-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 07/27/2021] [Indexed: 01/22/2023] Open
Abstract
Most virus-infected cells show morphological and behavioral changes, which are called cytopathic effects. Acanthamoeba castellanii, an abundant, free-living protozoan, serves as a laboratory host for some viruses of the phylum Nucleocytoviricota-the giant viruses. Many of these viruses cause cell rounding in the later stages of infection in the host cells. Here, we show the changes that lead to cell rounding in the host cells through time-lapse microscopy and image analysis. Time-lapse movies of A. castellanii cells infected with Mimivirus shirakomae, kyotovirus, medusavirus, or Pandoravirus japonicus were generated using a phase-contrast microscope. We updated our phase-contrast-based kinetic analysis algorithm for amoebae (PKA3) and used it to analyze these time-lapse movies. Image analysis revealed that the process leading to cell rounding varies among the giant viruses; for example, M. shirakomae infection did not cause changes for some time after the infection, kyotovirus infection caused an early decrease in the number of cells with typical morphologies, and medusavirus and P. japonicus infection frequently led to the formation of intercellular bridges and rotational behavior of host cells. These results suggest that in the case of giant viruses, the putative reactions of host cells against infection and the putative strategies of virus spread are diverse. IMPORTANCE Quantitative analysis of the infection process is important for a better understanding of viral infection strategies and virus-host interactions. Here, an image analysis of the phase-contrast time-lapse movies displayed quantitative differences in the process of cytopathic effects due to the four giant viruses in Acanthamoeba castellanii, which were previously unclear. It was revealed that medusavirus and Pandoravirus japonicus infection led to the formation of a significant number of elongated particles related to intercellular bridges, emphasizing the importance of research on the interaction of viruses with host cell nuclear function. Mimivirus shirakomae infection did not cause any changes in the host cells initially, so it is thought that the infected cells can actively move and spread over a wider area, emphasizing the importance of observation in a wider area and analysis of infection efficiency. These results suggest that a kinetic analysis using the phase-contrast-based kinetic analysis algorithm for amoebae (PKA3) reveals the infection strategies of each giant virus.
Collapse
Affiliation(s)
- Sho Fukaya
- Department of Applied Information Engineering, Faculty of Engineering, Suwa University of Science, Chino, Nagano, Japan
- Laboratory of Biology, Institute of Arts and Sciences, Tokyo University of Science, Shinjuku, Tokyo, Japan
| | - Masaharu Takemura
- Laboratory of Biology, Institute of Arts and Sciences, Tokyo University of Science, Shinjuku, Tokyo, Japan
- Laboratory of Biology, Graduate School of Mathematics and Science Education, Tokyo University of Science, Shinjuku, Tokyo, Japan
| |
Collapse
|
17
|
Haschka T, Ponger L, Escudé C, Mozziconacci J. MNHN-Tree-Tools: A toolbox for tree inference using multi-scale clustering of a set of sequences. Bioinformatics 2021; 37:3947-3949. [PMID: 34100911 DOI: 10.1093/bioinformatics/btab430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 05/12/2021] [Accepted: 06/07/2021] [Indexed: 01/23/2023] Open
Abstract
SUMMARY Genomic sequences are widely used to infer the evolutionary history of a given group of individuals. Many methods have been developed for sequence clustering and tree building. In the early days of genome sequencing, these were often limited to hundreds of sequences, but due to the surge of high throughput sequencing, it is now common to have millions of sampled sequences at hand. We introduce MNHN-Tree-Tools, a high performance set of algorithms that builds multi-scale, nested clusters of sequences found in a FASTA file. MNHN-Tree-Tools does not rely on sequence alignment and can thus be used on large datasets to infer a sequence tree. Herein we outline two applications: A human alpha-satellite repeats classification and a tree of life derivation from 16S/18S rDNA sequences. CODE AVAILABILITY Open source with a Zlib License via the Git protocol: https://gitlab.in2p3.fr/mnhn-tools/mnhn-tree-tools. SUPPLEMENTARY INFORMATION An in depth discussion about the algorithm with numerical simulations: https://gitlab.in2p3.fr/mnhn-tools/tree-tools-algorithms-document/-/raw/master/article.pdf. MANUAL A detailed users guide and tutorial: https://gitlab.in2p3.fr/mnhn-tools/mnhn-tree-tools-manual/-/raw/master/manual.pdf. WEBSITE AND FAQ http://treetools.haschka.net.
Collapse
Affiliation(s)
- Thomas Haschka
- Muséum National d'Histoire Naturelle, Structure et Instabilité des Génomes, UMR7196, Paris 75231, France
| | - Loic Ponger
- Muséum National d'Histoire Naturelle, Structure et Instabilité des Génomes, UMR7196, Paris 75231, France
| | - Christophe Escudé
- Muséum National d'Histoire Naturelle, Structure et Instabilité des Génomes, UMR7196, Paris 75231, France
| | - Julien Mozziconacci
- Muséum National d'Histoire Naturelle, Structure et Instabilité des Génomes, UMR7196, Paris 75231, France.,Institut Universitaire de France
| |
Collapse
|
18
|
Maier O, Spann SM, Pinter D, Gattringer T, Hinteregger N, Thallinger GG, Enzinger C, Pfeuffer J, Bredies K, Stollberger R. Non-linear fitting with joint spatial regularization in arterial spin labeling. Med Image Anal 2021; 71:102067. [PMID: 33930830 DOI: 10.1016/j.media.2021.102067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 03/26/2021] [Accepted: 04/01/2021] [Indexed: 10/21/2022]
Abstract
Multi-Delay single-shot arterial spin labeling (ASL) imaging provides accurate cerebral blood flow (CBF) and, in addition, arterial transit time (ATT) maps but the inherent low SNR can be challenging. Especially standard fitting using non-linear least squares often fails in regions with poor SNR, resulting in noisy estimates of the quantitative maps. State-of-the-art fitting techniques improve the SNR by incorporating prior knowledge in the estimation process which typically leads to spatial blurring. To this end, we propose a new estimation method with a joint spatial total generalized variation regularization on CBF and ATT. This joint regularization approach utilizes shared spatial features across maps to enhance sharpness and simultaneously improves noise suppression in the final estimates. The proposed method is evaluated at three levels, first on synthetic phantom data including pathologies, followed by in vivo acquisitions of healthy volunteers, and finally on patient data following an ischemic stroke. The quantitative estimates are compared to two reference methods, non-linear least squares fitting and a state-of-the-art ASL quantification algorithm based on Bayesian inference. The proposed joint regularization approach outperforms the reference implementations, substantially increasing the SNR in CBF and ATT while maintaining sharpness and quantitative accuracy in the estimates.
Collapse
Affiliation(s)
- Oliver Maier
- Institute of Medical Engineering, Graz University of Technology, Stremayrgasse 16/III, Graz 8010, Austria.
| | - Stefan M Spann
- Institute of Medical Engineering, Graz University of Technology, Stremayrgasse 16/III, Graz 8010, Austria.
| | - Daniela Pinter
- Department of Neurology, Division of General Neurology, Medical University of Graz, Auenbruggerplatz 22, Graz 8036, Austria.
| | - Thomas Gattringer
- Department of Neurology, Division of General Neurology, Medical University of Graz, Auenbruggerplatz 22, Graz 8036, Austria; Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University of Graz, Auenbruggerplatz 22, Graz 8036, Austria.
| | - Nicole Hinteregger
- Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University of Graz, Auenbruggerplatz 22, Graz 8036, Austria.
| | - Gerhard G Thallinger
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, Graz 8010, Austria; BioTechMed-Graz, Mozartgasse 12/II, Graz 8010, Austria.
| | - Christian Enzinger
- Department of Neurology, Division of General Neurology, Medical University of Graz, Auenbruggerplatz 22, Graz 8036, Austria; Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University of Graz, Auenbruggerplatz 22, Graz 8036, Austria.
| | - Josef Pfeuffer
- Application Development, Siemens Healthcare, Henkestraße 127, Erlangen 91052, Germany.
| | - Kristian Bredies
- Institute of Mathematics and Scientific Computing, University of Graz, Heinrichstraße 36, Graz 8010, Austria; BioTechMed-Graz, Mozartgasse 12/II, Graz 8010, Austria.
| | - Rudolf Stollberger
- Institute of Medical Engineering, Graz University of Technology, Stremayrgasse 16/III, Graz 8010, Austria; BioTechMed-Graz, Mozartgasse 12/II, Graz 8010, Austria.
| |
Collapse
|
19
|
Zou Y, Zhu Y, Li Y, Wu FX, Wang J. Parallel computing for genome sequence processing. Brief Bioinform 2021; 22:6210355. [PMID: 33822883 DOI: 10.1093/bib/bbab070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 01/26/2021] [Accepted: 02/10/2021] [Indexed: 01/08/2023] Open
Abstract
The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. To solve the problems caused by enormous data and complex computing requirements, researchers have proposed many methods and tools which can be divided into three types: big data storage, efficient algorithm design and parallel computing. The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing. Three common parallel computing models are introduced according to their hardware architectures, and each of which is classified into two or three types and is further analyzed with their features. Then, the parallel computing for genome sequence processing is discussed with four common applications: genome sequence alignment, single nucleotide polymorphism calling, genome sequence preprocessing, and pattern detection and searching. For each kind of application, its background is firstly introduced, and then a list of tools or algorithms are summarized in the aspects of principle, hardware platform and computing efficiency. The programming model of each hardware and application provides a reference for researchers to choose high-performance computing tools. Finally, we discuss the limitations and future trends of parallel computing technologies.
Collapse
Affiliation(s)
- You Zou
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, China
| | - Yuejie Zhu
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, China
| | - Yaohang Li
- computer science at Old Dominion University, USA
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at the University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering at Central South University, Changsha, Hunan, China
| |
Collapse
|
20
|
Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory. SENSORS 2021; 21:s21072364. [PMID: 33805349 PMCID: PMC8037599 DOI: 10.3390/s21072364] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 03/23/2021] [Accepted: 03/26/2021] [Indexed: 11/17/2022]
Abstract
Recent studies have applied the superior performance of deep learning to mobile devices, and these studies have enabled the running of the deep learning model on a mobile device with limited computing power. However, there is performance degradation of the deep learning model when it is deployed in mobile devices, due to the different sensors of each device. To solve this issue, it is necessary to train a network model specific to each mobile device. Therefore, herein, we propose an acceleration method for on-device learning to mitigate the device heterogeneity. The proposed method efficiently utilizes unified memory for reducing the latency of data transfer during network model training. In addition, we propose the layer-wise processor selection method to consider the latency generated by the difference in the processor performing the forward propagation step and the backpropagation step in the same layer. The experiments were performed on an ODROID-XU4 with the ResNet-18 model, and the experimental results indicate that the proposed method reduces the latency by at most 28.4% compared to the central processing unit (CPU) and at most 21.8% compared to the graphics processing unit (GPU). Through experiments using various batch sizes to measure the average power consumption, we confirmed that device heterogeneity is alleviated by performing on-device learning using the proposed method.
Collapse
|
21
|
Abstract
Over the past two decades, Long Short-Term Memory (LSTM) networks have been used to solve problems that require modeling of long sequence because they can selectively remember certain patterns over a long period, thus outperforming traditional feed-forward neural networks and Recurrent Neural Network (RNN) on learning long-term dependencies. However, LSTM is characterized by feedback dependence, which limits the high parallelism of general-purpose processors such as CPU and GPU. Besides, in terms of the energy efficiency of data center applications, the high consumption of GPU and CPU computing cannot be ignored. To deal with the above problems, Field Programmable Gate Array (FPGA) is becoming an ideal alternative. FPGA has the characteristics of low power consumption and low latency, which are helpful for the acceleration and optimization of LSTM and other RNNs. This paper proposes an implementation scheme of the LSTM network acceleration engine based on FPGA and further optimizes the implementation through fixed-point arithmetic, systolic array and lookup table for nonlinear function. On this basis, for easy deployment and application, we integrate the proposed acceleration engine into Caffe, one of the most popular deep learning frameworks. Experimental results show that, compared with CPU and GPU, the FPGA-based acceleration engine can achieve performance improvement of 8.8 and 2.2 times and energy efficiency improvement of 16.9 and 9.6 times, respectively, within Caffe framework.
Collapse
|
22
|
Wettenhovi VV, Vauhkonen M, Kolehmainen V. OMEGA-open-source emission tomography software. Phys Med Biol 2021; 66:065010. [PMID: 33588401 DOI: 10.1088/1361-6560/abe65f] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
In this paper we present OMEGA, an open-source software, for efficient and fast image reconstruction in positron emission tomography (PET). OMEGA uses the scripting language of MATLAB and GNU Octave allowing reconstruction of PET data with a MATLAB or GNU Octave interface. The goal of OMEGA is to allow easy and fast reconstruction of any PET data, and to provide a computationally efficient, easy-access platform for development of new PET algorithms with built-in forward and backward projection operations available to the user as a MATLAB/Octave class. OMEGA also includes direct support for GATE simulated data, facilitating easy evaluation of the new algorithms using Monte Carlo simulated PET data. OMEGA supports parallel computing by utilizing OpenMP for CPU implementations and OpenCL for GPU allowing any hardware to be used. OMEGA includes built-in function for the computation of normalization correction and allows several other corrections to be applied such as attenuation, randoms or scatter. OMEGA includes several different maximum-likelihood and maximum a posteriori (MAP) algorithms with several different priors. The user can also input their own priors to the built-in MAP functions. The image reconstruction in OMEGA can be computed either by using an explicitly computed system matrix or with a matrix-free formalism, where the latter can be accelerated with OpenCL. We provide an overview on the software and present some examples utilizing the different features of the software.
Collapse
Affiliation(s)
- V-V Wettenhovi
- Department of Applied Physics, University of Eastern Finland, Finland
| | | | | |
Collapse
|
23
|
Gueziri HE, Yan CXB, Collins DL. Open-source software for ultrasound-based guidance in spinal fusion surgery. ULTRASOUND IN MEDICINE & BIOLOGY 2020; 46:3353-3368. [PMID: 32907772 DOI: 10.1016/j.ultrasmedbio.2020.08.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 07/10/2020] [Accepted: 08/05/2020] [Indexed: 06/11/2023]
Abstract
Spinal instrumentation and surgical manipulations may cause loss of navigation accuracy requiring an efficient re-alignment of the patient anatomy with pre-operative images during surgery. While intra-operative ultrasound (iUS) guidance has shown clear potential to reduce surgery time, compared with clinical computed tomography (CT) guidance, rapid registration aiming to correct for patient misalignment has not been addressed. In this article, we present an open-source platform for pedicle screw navigation using iUS imaging. The alignment method is based on rigid registration of CT to iUS vertebral images and has been designed for fast and fully automatic patient re-alignment in the operating room. Two steps are involved: first, we use the iUS probe's trajectory to achieve an initial coarse registration; then, the registration transform is refined by simultaneously optimizing gradient orientation alignment and mean of iUS intensities passing through the CT-defined posterior surface of the vertebra. We evaluated our approach on a lumbosacral section of a porcine cadaver with seven vertebral levels. We achieved a median target registration error of 1.47 mm (100% success rate, defined by a target registration error <2 mm) when applying the probe's trajectory initial alignment. The approach exhibited high robustness to partial visibility of the vertebra with success rates of 89.86% and 88.57% when missing either the left or right part of the vertebra and robustness to initial misalignments with a success rate of 83.14% for random starts within ±20° rotation and ±20 mm translation. Our graphics processing unit implementation achieves an efficient registration time under 8 s, which makes the approach suitable for clinical application.
Collapse
Affiliation(s)
- Houssem-Eddine Gueziri
- McConnell Brain Imaging Center, Montreal Neurological Institute and Hospital, McGill University, Montreal, Quebec, Canada.
| | - Charles X B Yan
- Joint Department of Medical Imaging, University of Toronto, Toronto, Ontario, Canada
| | - D Louis Collins
- McConnell Brain Imaging Center, Montreal Neurological Institute and Hospital, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
24
|
Azghadi MR, Lammie C, Eshraghian JK, Payvand M, Donati E, Linares-Barranco B, Indiveri G. Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2020; 14:1138-1159. [PMID: 33156792 DOI: 10.1109/tbcas.2020.3036081] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors has brought on new opportunities for applying both Deep and Spiking Neural Network (SNN) algorithms to healthcare and biomedical applications at the edge. This can facilitate the advancement of medical Internet of Things (IoT) systems and Point of Care (PoC) devices. In this paper, we provide a tutorial describing how various technologies including emerging memristive devices, Field Programmable Gate Arrays (FPGAs), and Complementary Metal Oxide Semiconductor (CMOS) can be used to develop efficient DL accelerators to solve a wide variety of diagnostic, pattern recognition, and signal processing problems in healthcare. Furthermore, we explore how spiking neuromorphic processors can complement their DL counterparts for processing biomedical signals. The tutorial is augmented with case studies of the vast literature on neural network and neuromorphic hardware as applied to the healthcare domain. We benchmark various hardware platforms by performing a sensor fusion signal processing task combining electromyography (EMG) signals with computer vision. Comparisons are made between dedicated neuromorphic processors and embedded AI accelerators in terms of inference latency and energy. Finally, we provide our analysis of the field and share a perspective on the advantages, disadvantages, challenges, and opportunities that various accelerators and neuromorphic processors introduce to healthcare and biomedical domains.
Collapse
|
25
|
|
26
|
Tian C, Fei L, Zheng W, Xu Y, Zuo W, Lin CW. Deep learning on image denoising: An overview. Neural Netw 2020; 131:251-275. [PMID: 32829002 DOI: 10.1016/j.neunet.2020.07.025] [Citation(s) in RCA: 146] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 06/17/2020] [Accepted: 07/21/2020] [Indexed: 01/19/2023]
Abstract
Deep learning techniques have received much attention in the area of image denoising. However, there are substantial differences in the various types of deep learning methods dealing with image denoising. Specifically, discriminative learning based on deep learning can ably address the issue of Gaussian noise. Optimization models based on deep learning are effective in estimating the real noise. However, there has thus far been little related research to summarize the different deep learning techniques for image denoising. In this paper, we offer a comparative study of deep techniques in image denoising. We first classify the deep convolutional neural networks (CNNs) for additive white noisy images; the deep CNNs for real noisy images; the deep CNNs for blind denoising and the deep CNNs for hybrid noisy images, which represents the combination of noisy, blurred and low-resolution images. Then, we analyze the motivations and principles of the different types of deep learning methods. Next, we compare the state-of-the-art methods on public denoising datasets in terms of quantitative and qualitative analyses. Finally, we point out some potential challenges and directions of future research.
Collapse
Affiliation(s)
- Chunwei Tian
- Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, 518055, Guangdong, China
| | - Lunke Fei
- School of Computers, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Wenxian Zheng
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, Guangdong, China
| | - Yong Xu
- Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, Guangdong, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Shenzhen, 518055, Guangdong, China; Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
| | - Wangmeng Zuo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China; Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
| | - Chia-Wen Lin
- Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan
| |
Collapse
|
27
|
Abstract
AbstractNowadays, embedded systems are comprised of heterogeneous multi-core architectures, i.e., CPUs and GPUs. If the application is mapped to an appropriate processing core, then these architectures provide many performance benefits to applications. Typically, programmers map sequential applications to CPU and parallel applications to GPU. The task mapping becomes challenging because of the usage of evolving and complex CPU- and GPU-based architectures. This paper presents an approach to map the OpenCL application to heterogeneous multi-core architecture by determining the application suitability and processing capability. The classification is achieved by developing a machine learning-based device suitability classifier that predicts which processor has the highest computational compatibility to run OpenCL applications. In this paper, 20 distinct features are proposed that are extracted by using the developed LLVM-based static analyzer. In order to select the best subset of features, feature selection is performed by using both correlation analysis and the feature importance method. For the class imbalance problem, we use and compare synthetic minority over-sampling method with and without feature selection. Instead of hand-tuning the machine learning classifier, we use the tree-based pipeline optimization method to select the best classifier and its hyper-parameter. We then compare the optimized selected method with traditional algorithms, i.e., random forest, decision tree, Naïve Bayes and KNN. We apply our novel approach on extensively used OpenCL benchmarks, i.e., AMD and Polybench. The dataset contains 653 training and 277 testing applications. We test the classification results using four performance metrics, i.e., F-measure, precision, recall and $$R^2$$
R
2
. The optimized and reduced feature subset model achieved a high F-measure of 0.91 and $$R^2$$
R
2
of 0.76. The proposed framework automatically distributes the workload based on the application requirement and processor compatibility.
Collapse
|
28
|
Venkatakrishnan R, Misra A, Kindratenko V. High-Level Synthesis-Based Approach for Accelerating Scientific Codes on FPGAs. Comput Sci Eng 2020. [DOI: 10.1109/mcse.2020.2996072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
| | - Ashish Misra
- National Center for Supercomputing Applications (NCSA)University of Illinois
| | | |
Collapse
|
29
|
Investigations of Object Detection in Images/Videos Using Various Deep Learning Techniques and Embedded Platforms—A Comprehensive Review. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10093280] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In recent years there has been remarkable progress in one computer vision application area: object detection. One of the most challenging and fundamental problems in object detection is locating a specific object from the multiple objects present in a scene. Earlier traditional detection methods were used for detecting the objects with the introduction of convolutional neural networks. From 2012 onward, deep learning-based techniques were used for feature extraction, and that led to remarkable breakthroughs in this area. This paper shows a detailed survey on recent advancements and achievements in object detection using various deep learning techniques. Several topics have been included, such as Viola–Jones (VJ), histogram of oriented gradient (HOG), one-shot and two-shot detectors, benchmark datasets, evaluation metrics, speed-up techniques, and current state-of-art object detectors. Detailed discussions on some important applications in object detection areas, including pedestrian detection, crowd detection, and real-time object detection on Gpu-based embedded systems have been presented. At last, we conclude by identifying promising future directions.
Collapse
|
30
|
Cruz B, Zhu Z, Calderer C, Arsuaga J, Vazquez M. Quantitative Study of the Chiral Organization of the Phage Genome Induced by the Packaging Motor. Biophys J 2020; 118:2103-2116. [PMID: 32353255 PMCID: PMC7203069 DOI: 10.1016/j.bpj.2020.03.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 11/04/2019] [Accepted: 03/16/2020] [Indexed: 12/11/2022] Open
Abstract
Molecular motors that translocate DNA are ubiquitous in nature. During morphogenesis of double-stranded DNA bacteriophages, a molecular motor drives the viral genome inside a protein capsid. Several models have been proposed for the three-dimensional geometry of the packaged genome, but very little is known of the signature of the molecular packaging motor. For instance, biophysical experiments show that in some systems, DNA rotates during the packaging reaction, but most current biophysical models fail to incorporate this property. Furthermore, studies including rotation mechanisms have reached contradictory conclusions. In this study, we compare the geometrical signatures imposed by different possible mechanisms for the packaging motors: rotation, revolution, and rotation with revolution. We used a previously proposed kinetic Monte Carlo model of the motor, combined with Brownian dynamics simulations of DNA to simulate deterministic and stochastic motor models. We find that rotation is necessary for the accumulation of DNA writhe and for the chiral organization of the genome. We observe that although in the initial steps of the packaging reaction, the torsional strain of the genome is released by rotation of the molecule, in the later stages, it is released by the accumulation of writhe. We suggest that the molecular motor plays a key role in determining the final structure of the encapsidated genome in bacteriophages.
Collapse
Affiliation(s)
- Brian Cruz
- Department of Mathematics, University of California, Berkeley, California
| | - Zihao Zhu
- Department of Microbiology and Molecular Genetics, University of California at Davis, Davis, California
| | - Carme Calderer
- School of Mathematics, University of Minnesota, Minneapolis, Minnesota
| | - Javier Arsuaga
- Department of Mathematics, University of California at Davis, Davis, California; Department of Molecular and Cellular Biology, University of California at Davis, Davis, California.
| | - Mariel Vazquez
- Department of Microbiology and Molecular Genetics, University of California at Davis, Davis, California; Department of Mathematics, University of California at Davis, Davis, California.
| |
Collapse
|
31
|
Li JK, Ma KL. P4: Portable Parallel Processing Pipelines for Interactive Information Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1548-1561. [PMID: 30235137 DOI: 10.1109/tvcg.2018.2871139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present P4, an information visualization toolkit that combines declarative design specification and GPU computing for building high-performance interactive systems. Most of the existing information visualization toolkits do not harness the power of parallel processors in today's mainstream computers. P4 leverages GPU computing to accelerate both data processing and visualization rendering for interactive visualization applications. P4's programming interface offers a declarative visualization grammar for rapid specifications of data transformations, visual encodings, and interactions. By simplifying the development of GPU-accelerated visualization systems while supporting a high degree of flexibility and customization for design specification, P4 narrows the gap between expressiveness and scalability in information visualization toolkits. Through a range of examples and benchmark tests, we demonstrate that P4 provides high efficiency for creating interactive visualizations and offers drastic performance improvement over current state-of-the-art toolkits.
Collapse
|
32
|
Grupp RB, Hegeman RA, Murphy RJ, Alexander CP, Otake Y, McArthur BA, Armand M, Taylor RH. Pose Estimation of Periacetabular Osteotomy Fragments With Intraoperative X-Ray Navigation. IEEE Trans Biomed Eng 2020; 67:441-452. [PMID: 31059424 PMCID: PMC7297497 DOI: 10.1109/tbme.2019.2915165] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE State-of-the-art navigation systems for pelvic osteotomies use optical systems with external fiducials. In this paper, we propose the use of X-ray navigation for pose estimation of periacetabular fragments without fiducials. METHODS A two-dimensional/three-dimensional (2-D/3-D) registration pipeline was developed to recover fragment pose. This pipeline was tested through an extensive simulation study and six cadaveric surgeries. Using osteotomy boundaries in the fluoroscopic images, the preoperative plan was refined to more accurately match the intraoperative shape. RESULTS In simulation, average fragment pose errors were 1.3 ° /1.7 mm when the planned fragment matched the intraoperative fragment, 2.2 ° /2.1 mm when the plan was not updated to match the true shape, and 1.9 ° /2.0 mm when the fragment shape was intraoperatively estimated. In cadaver experiments, the average pose errors were 2.2 ° /2.2 mm, 3.8 ° /2.5 mm, and 3.5 ° /2.2 mm when registering with the actual fragment shape, a preoperative plan, and an intraoperatively refined plan, respectively. Average errors of the lateral center edge angle were less than 2 ° for all fragment shapes in simulation and cadaver experiments. CONCLUSION The proposed pipeline is capable of accurately reporting femoral head coverage within a range clinically identified for long-term joint survivability. SIGNIFICANCE Human interpretation of fragment pose is challenging and usually restricted to rotation about a single anatomical axis. The proposed pipeline provides an intraoperative estimate of rigid pose with respect to all anatomical axes, is compatible with minimally invasive incisions, and has no dependence on external fiducials.
Collapse
|
33
|
Jász Á, Rák Á, Ladjánszki I, Tornai GJ, Cserey G. Towards chemically accurate QM/MM simulations on GPUs. J Mol Graph Model 2020; 96:107536. [PMID: 31981899 DOI: 10.1016/j.jmgm.2020.107536] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 01/09/2020] [Accepted: 01/10/2020] [Indexed: 11/29/2022]
Abstract
Computational chemistry simulations are extensively used to model natural phenomena. To maintain performance similar to molecular mechanics, but achieve comparable accuracy to quantum mechanical calculations, many researchers are using hybrid QM/MM methods. In this article we evaluate our GPU-accelerated ONIOM implementation by measurements on the crambin and HIV integrase proteins with different size QM model systems. We demonstrate that by using a larger QM region, a better energy accuracy can be achieved at the expense of simulation time. This trade-off is important to consider for the researcher running QM/MM calculations. Furthermore, we show that the ONIOM energy monotonically approaches the pure quantum mechanical energy of the whole system. The experiments are made feasible by utilizing the cutting-edge BrianQC quantum chemistry module for Hartree-Fock level SCF and our GPU-accelerated MMFF94 force field implementation for molecular mechanics calculations.
Collapse
Affiliation(s)
- Ádám Jász
- StreamNovation Ltd., H-1083, Budapest, Práter utca 50/a., Hungary.
| | - Ádám Rák
- StreamNovation Ltd., H-1083, Budapest, Práter utca 50/a., Hungary.
| | | | | | - György Cserey
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, H-1083, Budapest, Práter utca 50/a., Hungary.
| |
Collapse
|
34
|
Krasznahorkay A, Leggett C, Mete AS, Snyder S, Tsulaia V. GPU Usage in ATLAS Reconstruction and Analysis. EPJ WEB OF CONFERENCES 2020. [DOI: 10.1051/epjconf/202024505006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
With Graphical Processing Units (GPUs) and other kinds of accelerators becoming ever more accessible, High Performance Computing Centres all around the world using them ever more, ATLAS has to find the best way of making use of such accelerators in much of its computing.
Tests with GPUs – mainly with CUDA – have been performed in the past in the experiment. At that time the conclusion was that it was not advantageous for the ATLAS offline and trigger software to invest time and money into GPUs. However as the usage of accelerators has become cheaper and simpler in recent years, their re-evaluation in ATLAS’s offline software is warranted.
We show new results of using GPU accelerated calculations in ATLAS’s offline software environment using the ATLAS offline/analysis (xAOD) Event Data Model. We compare the performance and flexibility of a couple of the available GPU programming methods, and show how different memory management setups affect our ability to offload different types of calculations to a GPU efficiently.
Collapse
|
35
|
FAST-FUSION: An Improved Accuracy Omnidirectional Visual Odometry System with Sensor Fusion and GPU Optimization for Embedded Low Cost Hardware. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9245516] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The main task while developing a mobile robot is to achieve accurate and robust navigation in a given environment. To achieve such a goal, the ability of the robot to localize itself is crucial. In outdoor, namely agricultural environments, this task becomes a real challenge because odometry is not always usable and global navigation satellite systems (GNSS) signals are blocked or significantly degraded. To answer this challenge, this work presents a solution for outdoor localization based on an omnidirectional visual odometry technique fused with a gyroscope and a low cost planar light detection and ranging (LIDAR), that is optimized to run in a low cost graphical processing unit (GPU). This solution, named FAST-FUSION, proposes to the scientific community three core contributions. The first contribution is an extension to the state-of-the-art monocular visual odometry (Libviso2) to work with omnidirectional cameras and single axis gyro to increase the system accuracy. The second contribution, it is an algorithm that considers low cost LIDAR data to estimate the motion scale and solve the limitations of monocular visual odometer systems. Finally, we propose an heterogeneous computing optimization that considers a Raspberry Pi GPU to improve the visual odometry runtime performance in low cost platforms. To test and evaluate FAST-FUSION, we created three open-source datasets in an outdoor environment. Results shows that FAST-FUSION is acceptable to run in real-time in low cost hardware and that outperforms the original Libviso2 approach in terms of time performance and motion estimation accuracy.
Collapse
|
36
|
Vallejo Ramirez PP, Zammit J, Vanderpoorten O, Riche F, Blé FX, Zhou XH, Spiridon B, Valentine C, Spasov SE, Oluwasanya PW, Goodfellow G, Fantham MJ, Siddiqui O, Alimagham F, Robbins M, Stretton A, Simatos D, Hadeler O, Rees EJ, Ströhl F, Laine RF, Kaminski CF. OptiJ: Open-source optical projection tomography of large organ samples. Sci Rep 2019; 9:15693. [PMID: 31666606 PMCID: PMC6821862 DOI: 10.1038/s41598-019-52065-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 10/09/2019] [Indexed: 12/20/2022] Open
Abstract
The three-dimensional imaging of mesoscopic samples with Optical Projection Tomography (OPT) has become a powerful tool for biomedical phenotyping studies. OPT uses visible light to visualize the 3D morphology of large transparent samples. To enable a wider application of OPT, we present OptiJ, a low-cost, fully open-source OPT system capable of imaging large transparent specimens up to 13 mm tall and 8 mm deep with 50 µm resolution. OptiJ is based on off-the-shelf, easy-to-assemble optical components and an ImageJ plugin library for OPT data reconstruction. The software includes novel correction routines for uneven illumination and sample jitter in addition to CPU/GPU accelerated reconstruction for large datasets. We demonstrate the use of OptiJ to image and reconstruct cleared lung lobes from adult mice. We provide a detailed set of instructions to set up and use the OptiJ framework. Our hardware and software design are modular and easy to implement, allowing for further open microscopy developments for imaging large organ samples.
Collapse
Affiliation(s)
- Pedro P Vallejo Ramirez
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Joseph Zammit
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Oliver Vanderpoorten
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Fergus Riche
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Francois-Xavier Blé
- Clinical Discovery Unit, Early Clinical Development, IMED Biotech Unit, AstraZeneca, Cambridge, UK
| | - Xiao-Hong Zhou
- Bioscience, Respiratory, Inflammation and Autoimmunity, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden
| | - Bogdan Spiridon
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | | | - Simeon E Spasov
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | | | - Gemma Goodfellow
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Marcus J Fantham
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Omid Siddiqui
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Farah Alimagham
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Miranda Robbins
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Andrew Stretton
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Dimitrios Simatos
- Sensor CDT 2015-2016 student cohort, University of Cambridge, Cambridge, UK
| | - Oliver Hadeler
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Eric J Rees
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Florian Ströhl
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
- Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037, Tromsø, Norway
| | - Romain F Laine
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
- Medical Research Council Laboratory for Molecular Cell Biology (LMCB), University College London, Gower Street, London, WC1E 6BT, UK
| | - Clemens F Kaminski
- Laser Analytics Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
37
|
Drozdowski P, Rathgeb C, Busch C. Computational workload in biometric identification systems: an overview. IET BIOMETRICS 2019. [DOI: 10.1049/iet-bmt.2019.0076] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Pawel Drozdowski
- da/sec – Biometrics and Internet Security Research Group, Hochschule DarmstadtDarmstadtGermany
- NBL – Norwegian Biometrics LaboratoryNorwegian University of Science and TechnologyGjøvikNorway
| | - Christian Rathgeb
- da/sec – Biometrics and Internet Security Research Group, Hochschule DarmstadtDarmstadtGermany
| | - Christoph Busch
- da/sec – Biometrics and Internet Security Research Group, Hochschule DarmstadtDarmstadtGermany
| |
Collapse
|
38
|
Research on OpenCL optimization for FPGA deep learning application. PLoS One 2019; 14:e0222984. [PMID: 31600218 PMCID: PMC6786543 DOI: 10.1371/journal.pone.0222984] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 09/11/2019] [Indexed: 11/19/2022] Open
Abstract
In recent years, with the development of computer science, deep learning is held as competent enough to solve the problem of inference and learning in high dimensional space. Therefore, it has received unprecedented attention from both the academia and the business community. Compared with CPU/GPU, FPGA has attracted much attention for its high-energy efficiency, short development cycle and reconfigurability in the aspect of deep learning algorithm. However, because of the limited research on OpenCL optimization on FPGA of deep learning algorithms, OpenCL tools and models applied to CPU/GPU cannot be directly used on FPGA. This makes it difficult for software programmers to use FPGA when implementing deep learning algorithms for a rewarding performance. To solve this problem, this paper proposed an OpenCL computational model based on FPGA template architecture to optimize the time-consuming convolution layer in deep learning. The comparison between the program applying the computational model and the corresponding optimization program provided by Xilinx indicates that the former is 8-40 times higher than the latter in terms of performance.
Collapse
|
39
|
Zhang S, Wu Y, Men C, He H, Liang K. Research on OpenCL optimization for FPGA deep learning application. PLoS One 2019. [PMID: 31600218 PMCID: PMC6786543 DOI: 10.1371/journal.pone.0222984,] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
In recent years, with the development of computer science, deep learning is held as competent enough to solve the problem of inference and learning in high dimensional space. Therefore, it has received unprecedented attention from both the academia and the business community. Compared with CPU/GPU, FPGA has attracted much attention for its high-energy efficiency, short development cycle and reconfigurability in the aspect of deep learning algorithm. However, because of the limited research on OpenCL optimization on FPGA of deep learning algorithms, OpenCL tools and models applied to CPU/GPU cannot be directly used on FPGA. This makes it difficult for software programmers to use FPGA when implementing deep learning algorithms for a rewarding performance. To solve this problem, this paper proposed an OpenCL computational model based on FPGA template architecture to optimize the time-consuming convolution layer in deep learning. The comparison between the program applying the computational model and the corresponding optimization program provided by Xilinx indicates that the former is 8-40 times higher than the latter in terms of performance.
Collapse
Affiliation(s)
- Shuo Zhang
- College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
| | - Yanxia Wu
- College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
- * E-mail:
| | - Chaoguang Men
- College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
| | - Hongtao He
- College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
| | - Kai Liang
- College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
| |
Collapse
|
40
|
Eberhardt J, Stote RH, Dejaegere A. Unrolr: Structural analysis of protein conformations using stochastic proximity embedding. J Comput Chem 2019; 39:2551-2557. [PMID: 30447084 DOI: 10.1002/jcc.25599] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 08/24/2018] [Accepted: 08/24/2018] [Indexed: 01/29/2023]
Abstract
Molecular dynamics (MD) simulations are widely used to explore the conformational space of biological macromolecules. Advances in hardware, as well as in methods, make the generation of large and complex MD datasets much more common. Although different clustering and dimensionality reduction methods have been applied to MD simulations, there remains a need for improved strategies that handle nonlinear data and/or can be applied to very large datasets. We present an original implementation of the pivot-based version of the stochastic proximity embedding method aimed at large MD datasets using the dihedral distance as a metric. The advantages of the algorithm in terms of data storage and computational efficiency are presented, as well as the implementation realized. Application and testing through the analysis of a 200 ns accelerated MD simulation of a 35-residue villin headpiece is discussed. Analysis of the simulation shows the promise of this method to organize large conformational ensembles. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Jérôme Eberhardt
- Biologie structurale intégrative Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Institut National de La Santé et de La Recherche Médicale (INSERM), U1258/Centre National de Recherche Scientifique (CNRS), UMR7104/Université de Strasbourg, Illkirch, France
| | - Roland H Stote
- Biologie structurale intégrative Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Institut National de La Santé et de La Recherche Médicale (INSERM), U1258/Centre National de Recherche Scientifique (CNRS), UMR7104/Université de Strasbourg, Illkirch, France
| | - Annick Dejaegere
- Biologie structurale intégrative Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Institut National de La Santé et de La Recherche Médicale (INSERM), U1258/Centre National de Recherche Scientifique (CNRS), UMR7104/Université de Strasbourg, Illkirch, France
| |
Collapse
|
41
|
Grasseau G, Beaudette F, Perez CM, Zabi A, Chiron A, Strebler T, Hautreux G. Deployment of a Matrix Element Method code for the ttH channel analysis on GPU’s platform. EPJ WEB OF CONFERENCES 2019. [DOI: 10.1051/epjconf/201921406028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The observation of the associated production of the Higgs boson with two top quarks in proton-proton collisions is one of the highlights of the LHC Run 2. Driven by the theoretical description of the physics processes, the Matrix Element Method (MEM) consists in computing a probability that an event is compatible with the signal hypothesis (ttH) or with one of the background hypotheses. It is a powerful classifying tool requiring high dimensional integral computations. The deployment of our MEM production code on GPU’s platform will be described. What follows will focus on the adaptation of the main components of the computations in OpenCL kernels, namely the Magraph matrix element code generator, VEGAS, and LHAPDF. Finally, the gain obtained on GPU’s platforms compared with classical CPU’s platforms will be assessed.
Collapse
|
42
|
Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications. J Intell Inf Syst 2019. [DOI: 10.1007/s10844-019-00575-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
43
|
Bai N, Tang S, Yu C, Fu H, Wang C, Chen X. GMSA: A Data Sharing System for Multiple Sequence Alignment Across Multiple Users. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190111160101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
In recent years, the rapid growth of biological datasets in Bioinformatics
has made the computation of Multiple Sequence Alignment (MSA) become extremely slow. Using
the GPU to accelerate MSA has shown to be an effective approach. Moreover, there is a trend that
many bioinformatic researchers or institutes setup a shared server for remote users to submit MSA
jobs via provided web-pages or tools.
Objective:
Given the fact that different MSA jobs submitted by users often process similar datasets,
there can be an opportunity for users to share their computation results between each other,
which can avoid the redundant computation and thereby reduce the overall computing time. Furthermore,
in the heterogeneous CPU/GPU platform, many existing applications assign their computation
on GPU devices only, which leads to a waste of the CPU resources. Co-run computation
can increase the utilization of computing resources on both CPUs and GPUs by dispatching workloads
onto them simultaneously.
Methods:
In this paper, we propose an efficient MSA system called GMSA for multi-users on
shared heterogeneous CPU/GPU platforms. To accelerate the computation of jobs from multiple
users, data sharing is considered in GMSA due to the fact that different MSA jobs often have a
percentage of the same data and tasks. Additionally, we also propose a scheduling strategy based
on the similarity in datasets or tasks between MSA jobs. Furthermore, co-run computation model
is adopted to take full use of both CPUs and GPUs.
Results:
We use four protein datasets which were redesigned according to different similarity. We
compare GMSA with ClustalW and CUDA-ClustalW in multiple users scenarios. Experiments results
showed that GMSA can achieve a speedup of up to 32X.
Conclusion:
GMSA is a system designed for accelerating the computation of MSA jobs with
shared input datasets on heterogeneous CPU/GPU platforms. In this system, a strategy was proposed
and implemented to find the common datasets among jobs submitted by multiple users, and
a scheduling algorithm is presented based on it. To utilize the overall resource of both CPU and
GPU, GMSA employs the co-run computation model. Results showed that it can speed up the total
computation of jobs efficiently.
Collapse
Affiliation(s)
- Na Bai
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, P.O. Box 300350, China
| | - Shanjiang Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, P.O. Box 300350, China
| | - Ce Yu
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, P.O. Box 300350, China
| | - Hao Fu
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, P.O. Box 300350, China
| | - Chen Wang
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana- Champaign, Illinois, United States
| | - Xi Chen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, P.O. Box 300350, China
| |
Collapse
|
44
|
|
45
|
Hernandez-Fernandez M, Reguly I, Jbabdi S, Giles M, Smith S, Sotiropoulos SN. Using GPUs to accelerate computational diffusion MRI: From microstructure estimation to tractography and connectomes. Neuroimage 2019; 188:598-615. [PMID: 30537563 PMCID: PMC6614035 DOI: 10.1016/j.neuroimage.2018.12.015] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 11/20/2018] [Accepted: 12/07/2018] [Indexed: 12/27/2022] Open
Abstract
The great potential of computational diffusion MRI (dMRI) relies on indirect inference of tissue microstructure and brain connections, since modelling and tractography frameworks map diffusion measurements to neuroanatomical features. This mapping however can be computationally highly expensive, particularly given the trend of increasing dataset sizes and the complexity in biophysical modelling. Limitations on computing resources can restrict data exploration and methodology development. A step forward is to take advantage of the computational power offered by recent parallel computing architectures, especially Graphics Processing Units (GPUs). GPUs are massive parallel processors that offer trillions of floating point operations per second, and have made possible the solution of computationally-intensive scientific problems that were intractable before. However, they are not inherently suited for all problems. Here, we present two different frameworks for accelerating dMRI computations using GPUs that cover the most typical dMRI applications: a framework for performing biophysical modelling and microstructure estimation, and a second framework for performing tractography and long-range connectivity estimation. The former provides a front-end and automatically generates a GPU executable file from a user-specified biophysical model, allowing accelerated non-linear model fitting in both deterministic and stochastic ways (Bayesian inference). The latter performs probabilistic tractography, can generate whole-brain connectomes and supports new functionality for imposing anatomical constraints, such as inherent consideration of surface meshes (GIFTI files) along with volumetric images. We validate the frameworks against well-established CPU-based implementations and we show that despite the very different challenges for parallelising these problems, a single GPU achieves better performance than 200 CPU cores thanks to our parallel designs.
Collapse
Affiliation(s)
- Moises Hernandez-Fernandez
- Wellcome Centre for Integrative Neuroimaging - Centre for Functional Magnetic Resonance Imaging of the Brain (FMRIB), University of Oxford, Oxford, United Kingdom; Center for Biomedical Image Computing and Analytics (CBICA), Department of Radiology, University of Pennsylvania, Philadelphia, PA, United States.
| | - Istvan Reguly
- Faculty of Information Technology and Bionics, Pazmany Peter Catholic University, Budapest, Hungary
| | - Saad Jbabdi
- Wellcome Centre for Integrative Neuroimaging - Centre for Functional Magnetic Resonance Imaging of the Brain (FMRIB), University of Oxford, Oxford, United Kingdom
| | - Mike Giles
- Mathematical Institute, University of Oxford, Oxford, United Kingdom
| | - Stephen Smith
- Wellcome Centre for Integrative Neuroimaging - Centre for Functional Magnetic Resonance Imaging of the Brain (FMRIB), University of Oxford, Oxford, United Kingdom
| | - Stamatios N Sotiropoulos
- Wellcome Centre for Integrative Neuroimaging - Centre for Functional Magnetic Resonance Imaging of the Brain (FMRIB), University of Oxford, Oxford, United Kingdom; Sir Peter Mansfield Imaging Centre, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| |
Collapse
|
46
|
|
47
|
Hoozemans J, de Jong R, van der Vlugt S, Van Straten J, Elango UK, Al-Ars Z. Frame-based Programming, Stream-Based Processing for Medical Image Processing Applications. JOURNAL OF SIGNAL PROCESSING SYSTEMS 2019; 91:47-59. [PMID: 30873259 PMCID: PMC6390719 DOI: 10.1007/s11265-018-1422-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 10/20/2018] [Accepted: 11/01/2018] [Indexed: 06/09/2023]
Abstract
This paper presents and evaluates an approach to deploy image and video processing pipelines that are developed frame-oriented on a hardware platform that is stream-oriented, such as an FPGA. First, this calls for a specialized streaming memory hierarchy and accompanying software framework that transparently moves image segments between stages in the image processing pipeline. Second, we use softcore VLIW processors, that are targetable by a C compiler and have hardware debugging capabilities, to evaluate and debug the software before moving to a High-Level Synthesis flow. The algorithm development phase, including debugging and optimizing on the target platform, is often a very time consuming step in the development of a new product. Our proposed platform allows both software developers and hardware designers to test iterations in a matter of seconds (compilation time) instead of hours (synthesis or circuit simulation time).
Collapse
Affiliation(s)
- Joost Hoozemans
- Computer Engineering Laboratory, Delft University of Technology, Delft, The Netherlands
| | | | | | - Jeroen Van Straten
- Computer Engineering Laboratory, Delft University of Technology, Delft, The Netherlands
| | - Uttam Kumar Elango
- Computer Engineering Laboratory, Delft University of Technology, Delft, The Netherlands
| | - Zaid Al-Ars
- Computer Engineering Laboratory, Delft University of Technology, Delft, The Netherlands
| |
Collapse
|
48
|
Hoozemans J, van Straten J, Viitanen T, Tervo A, Kadlec J, Al-Ars Z. ALMARVI Execution Platform: Heterogeneous Video Processing SoC Platform on FPGA. JOURNAL OF SIGNAL PROCESSING SYSTEMS 2019; 91:61-73. [PMID: 30873260 PMCID: PMC6390713 DOI: 10.1007/s11265-018-1424-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Revised: 11/06/2018] [Accepted: 11/13/2018] [Indexed: 06/09/2023]
Abstract
The proliferation of processing hardware alternatives allows developers to use various customized computing platforms to run their applications in an optimal way. However, porting application code on custom hardware requires a lot of development and porting effort. This paper describes a heterogeneous computational platform (the ALMARVI execution platform) comprising of multiple communicating processors that allow easy programmability through an interface to OpenCL. The ALMARVI platform uses processing elements based on both VLIW and Transport Triggered Architectures (ρ-VEX and TCE cores, respectively). It can be implemented on Zynq devices such as the ZedBoard, and supports OpenCL by means of the pocl (Portable OpenCL) project and our ALMAIF interface specification. This allows developers to execute kernels transparently on either processing elements, thereby allowing to optimize execution time with minimal design and development effort.
Collapse
Affiliation(s)
| | | | | | - Aleksi Tervo
- Tampere University of Technology, Tampere, Finland
| | | | - Zaid Al-Ars
- Delft University of Technology, Delft, The Netherlands
| |
Collapse
|
49
|
Chen Y, Zhou L, Tang Y, Singh JP, Bouguila N, Wang C, Wang H, Du J. Fast neighbor search by using revised k-d tree. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.09.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
50
|
Nawaf MM, Merad D, Royer JP, Boï JM, Saccone M, Ben Ellefi M, Drap P. Fast Visual Odometry for a Low-Cost Underwater Embedded Stereo System †. SENSORS 2018; 18:s18072313. [PMID: 30018215 PMCID: PMC6068653 DOI: 10.3390/s18072313] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 07/10/2018] [Accepted: 07/14/2018] [Indexed: 11/21/2022]
Abstract
This paper provides details of hardware and software conception and realization of a stereo embedded system for underwater imaging. The system provides several functions that facilitate underwater surveys and run smoothly in real-time. A first post-image acquisition module provides direct visual feedback on the quality of the taken images which helps appropriate actions to be taken regarding movement speed and lighting conditions. Our main contribution is a light visual odometry method adapted to the underwater context. The proposed method uses the captured stereo image stream to provide real-time navigation and a site coverage map which is necessary to conduct a complete underwater survey. The visual odometry uses a stochastic pose representation and semi-global optimization approach to handle large sites and provides long-term autonomy, whereas a novel stereo matching approach adapted to underwater imaging and system attached lighting allows fast processing and suitability to low computational resource systems. The system is tested in a real context and shows its robustness and promising future potential.
Collapse
Affiliation(s)
- Mohamad Motasem Nawaf
- Aix-Marseille Université, CNRS, ENSAM , Université De Toulon, LIS UMR 7020, Domaine Universitaire de Saint-Jérôme, Bâtiment Polytech, Avenue Escadrille Normandie-Niemen, 13397 Marseille, France.
| | - Djamal Merad
- Aix-Marseille Université, CNRS, ENSAM , Université De Toulon, LIS UMR 7020, Domaine Universitaire de Saint-Jérôme, Bâtiment Polytech, Avenue Escadrille Normandie-Niemen, 13397 Marseille, France.
| | - Jean-Philip Royer
- Aix-Marseille Université, CNRS, ENSAM , Université De Toulon, LIS UMR 7020, Domaine Universitaire de Saint-Jérôme, Bâtiment Polytech, Avenue Escadrille Normandie-Niemen, 13397 Marseille, France.
| | - Jean-Marc Boï
- Aix-Marseille Université, CNRS, ENSAM , Université De Toulon, LIS UMR 7020, Domaine Universitaire de Saint-Jérôme, Bâtiment Polytech, Avenue Escadrille Normandie-Niemen, 13397 Marseille, France.
| | - Mauro Saccone
- Aix-Marseille Université, CNRS, ENSAM , Université De Toulon, LIS UMR 7020, Domaine Universitaire de Saint-Jérôme, Bâtiment Polytech, Avenue Escadrille Normandie-Niemen, 13397 Marseille, France.
| | - Mohamed Ben Ellefi
- Aix-Marseille Université, CNRS, ENSAM , Université De Toulon, LIS UMR 7020, Domaine Universitaire de Saint-Jérôme, Bâtiment Polytech, Avenue Escadrille Normandie-Niemen, 13397 Marseille, France.
| | - Pierre Drap
- Aix-Marseille Université, CNRS, ENSAM , Université De Toulon, LIS UMR 7020, Domaine Universitaire de Saint-Jérôme, Bâtiment Polytech, Avenue Escadrille Normandie-Niemen, 13397 Marseille, France.
| |
Collapse
|