1
|
Maruyama Y, Yoshida N. RISMiCal: A software package to perform fast RISM/3D-RISM calculations. J Comput Chem 2024; 45:1470-1482. [PMID: 38472097 DOI: 10.1002/jcc.27340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 02/23/2024] [Accepted: 02/28/2024] [Indexed: 03/14/2024]
Abstract
Solvent plays an essential role in a variety of chemical, physical, and biological processes that occur in the solution phase. The reference interaction site model (RISM) and its three-dimensional extension (3D-RISM) serve as powerful computational tools for modeling solvation effects in chemical reactions, biological functions, and structure formations. We present the RISM integrated calculator (RISMiCal) program package, which is based on RISM and 3D-RISM theories with fast GPU code. RISMiCal has been developed as an integrated RISM/3D-RISM program that has interfaces with external programs such as Gaussian16, GAMESS, and Tinker. Fast 3D-RISM programs for single- and multi-GPU codes written in CUDA would enhance the availability of these hybrid methods because they require the performance of many computationally expensive 3D-RISM calculations. We expect that our package can be widely applied for chemical and biological processes in solvent. The RISMiCal package is available at https://rismical-dev.github.io.
Collapse
Affiliation(s)
- Yutaka Maruyama
- Data Science Center for Creative Design and Manufacturing, The Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan
- Department of Physics, School of Science and Technology, Meiji University, Kawasaki-shi, Kanagawa, Japan
| | - Norio Yoshida
- Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Japan
| |
Collapse
|
2
|
Lee S, Lee H, Kim Y, Kim J, Choi W. GPU-Accelerated PD-IPM for Real-Time Model Predictive Control in Integrated Missile Guidance and Control Systems. Sensors (Basel) 2022; 22:s22124512. [PMID: 35746292 PMCID: PMC9231268 DOI: 10.3390/s22124512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 11/16/2022]
Abstract
This paper addresses the problem of real-time model predictive control (MPC) in the integrated guidance and control (IGC) of missile systems. When the primal-dual interior point method (PD-IPM), which is a convex optimization method, is used as an optimization solution for the MPC, the real-time performance of PD-IPM degenerates due to the elevated computation time in checking the Karush-Kuhn-Tucker (KKT) conditions in PD-IPM. This paper proposes a graphics processing unit (GPU)-based method to parallelize and accelerate PD-IPM for real-time MPC. The real-time performance of the proposed method was tested and analyzed on a widely-used embedded system. The comparison results with the conventional PD-IPM and other methods showed that the proposed method improved the real-time performance by reducing the computation time significantly.
Collapse
Affiliation(s)
- Sanghyeon Lee
- Research Institute of Manufacturing and Productivity, Kumoh National Institute of Technology, Gumi 39177, Gyeongbuk, Korea;
| | - Heoncheol Lee
- Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi 39177, Gyeongbuk, Korea
- Correspondence: ; Tel.: +82-54-478-7458
| | - Yunyoung Kim
- Precision Guided Munition R&D Laboratory, LIGNEX1, Seongnam 13488, Gyeonggi, Korea; (Y.K.); (J.K.); (W.C.)
| | - Jaehyun Kim
- Precision Guided Munition R&D Laboratory, LIGNEX1, Seongnam 13488, Gyeonggi, Korea; (Y.K.); (J.K.); (W.C.)
| | - Wonseok Choi
- Precision Guided Munition R&D Laboratory, LIGNEX1, Seongnam 13488, Gyeonggi, Korea; (Y.K.); (J.K.); (W.C.)
| |
Collapse
|
3
|
Soubervielle-Montalvo C, Perez-Cham OE, Puente C, Gonzalez-Galvan EJ, Olague G, Aguirre-Salado CA, Cuevas-Tello JC, Ontanon-Garcia LJ. Design of a Low-Power Embedded System Based on a SoC-FPGA and the Honeybee Search Algorithm for Real-Time Video Tracking. Sensors (Basel) 2022; 22:1280. [PMID: 35162025 DOI: 10.3390/s22031280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 01/28/2022] [Accepted: 02/02/2022] [Indexed: 02/01/2023]
Abstract
Video tracking involves detecting previously designated objects of interest within a sequence of image frames. It can be applied in robotics, unmanned vehicles, and automation, among other fields of interest. Video tracking is still regarded as an open problem due to a number of obstacles that still need to be overcome, including the need for high precision and real-time results, as well as portability and low-power demands. This work presents the design, implementation and assessment of a low-power embedded system based on an SoC-FPGA platform and the honeybee search algorithm (HSA) for real-time video tracking. HSA is a meta-heuristic that combines evolutionary computing and swarm intelligence techniques. Our findings demonstrated that the combination of SoC-FPGA and HSA reduced the consumption of computational resources, allowing real-time multiprocessing without a reduction in precision, and with the advantage of lower power consumption, which enabled portability. A starker difference was observed when measuring the power consumption. The proposed SoC-FPGA system consumed about 5 Watts, whereas the CPU-GPU system required more than 200 Watts. A general recommendation obtained from this research is to use SoC-FPGA over CPU-GPU to work with meta-heuristics in computer vision applications when an embedded solution is required.
Collapse
|
4
|
Gangopadhyay A, Winberg S, Naidoo KJ. Anisotropic numerical potentials for coarse-grained modeling from high-speed multidimensional lookup table and interpolation algorithms. J Comput Chem 2021; 42:666-675. [PMID: 33547644 DOI: 10.1002/jcc.26487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 01/15/2021] [Accepted: 01/18/2021] [Indexed: 11/12/2022]
Abstract
A high-speed numerical potential delivering computational performance comparable with complex coarse-grained analytic potentials makes available models that have greater degrees of physical and chemical accuracy. This opens the possibility of increased accuracy in classical molecular dynamics simulations of anisotropic systems. In this work, we report the development of a high-speed lookup table (LUT) of four-dimensional gridded data, that uses cubic B-spline interpolations to derive off grid values and their associated partial derivatives that are located between the known grid data points. The accuracy of the coarse-grained numerical potential using a LUT from uniaxial Gay-Berne (GB) potential produced array of values is within a 3% and a 5% margin of error respectively for the interpolation of the uniaxial GB potential and its partial derivatives. The numerical potential model and partial derivatives speedup is made competitive with the analytical potential by exploiting graphics processing units on board functionality. The capability of the numerical potential is demonstrated by comparing minimizations of a box of 500 naphthalene molecules. The minimizations using a full atomistic (NAMD/CHARMM force field), a biaxial GB and a numerical potential from a LUT using data from the CHARMM pair potential was done. The numerical potential model is significantly more accurate in its approximation of the atomistic local minimum configuration than is the biaxial GB analytical potential function. This demonstrates that using a numerical potential founded on a direct lookup of the atomistic potential landscape significantly improves coarse grain (CG) modeling of complex molecules, possibly paving the way for accurate anisotropic system CG modeling.
Collapse
Affiliation(s)
- Ananya Gangopadhyay
- Scientific Computing Research Unit and Department of Chemistry, University of Cape Town, Cape Town, South Africa
| | - Simon Winberg
- Scientific Computing Research Unit and Department of Electrical Engineering, University of Cape Town, Cape Town, South Africa
| | - Kevin J Naidoo
- Scientific Computing Research Unit and Department of Chemistry, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
5
|
Kuriyama R, Casellato C, D'Angelo E, Yamazaki T. Real-Time Simulation of a Cerebellar Scaffold Model on Graphics Processing Units. Front Cell Neurosci 2021; 15:623552. [PMID: 33897369 PMCID: PMC8058369 DOI: 10.3389/fncel.2021.623552] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 03/15/2021] [Indexed: 11/13/2022] Open
Abstract
Large-scale simulation of detailed computational models of neuronal microcircuits plays a prominent role in reproducing and predicting the dynamics of the microcircuits. To reconstruct a microcircuit, one must choose neuron and synapse models, placements, connectivity, and numerical simulation methods according to anatomical and physiological constraints. For reconstruction and refinement, it is useful to be able to replace one module easily while leaving the others as they are. One way to achieve this is via a scaffolding approach, in which a simulation code is built on independent modules for placements, connections, and network simulations. Owing to the modularity of functions, this approach enables researchers to improve the performance of the entire simulation by simply replacing a problematic module with an improved one. Casali et al. (2019) developed a spiking network model of the cerebellar microcircuit using this approach, and while it reproduces electrophysiological properties of cerebellar neurons, it takes too much computational time. Here, we followed this scaffolding approach and replaced the simulation module with an accelerated version on graphics processing units (GPUs). Our cerebellar scaffold model ran roughly 100 times faster than the original version. In fact, our model is able to run faster than real time, with good weak and strong scaling properties. To demonstrate an application of real-time simulation, we implemented synaptic plasticity mechanisms at parallel fiber-Purkinje cell synapses, and carried out simulation of behavioral experiments known as gain adaptation of optokinetic response. We confirmed that the computer simulation reproduced experimental findings while being completed in real time. Actually, a computer simulation for 2 s of the biological time completed within 750 ms. These results suggest that the scaffolding approach is a promising concept for gradual development and refactoring of simulation codes for large-scale elaborate microcircuits. Moreover, a real-time version of the cerebellar scaffold model, which is enabled by parallel computing technology owing to GPUs, may be useful for large-scale simulations and engineering applications that require real-time signal processing and motor control.
Collapse
Affiliation(s)
- Rin Kuriyama
- Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
| | - Claudia Casellato
- Neurophysiology Unit, Neurocomputational Laboratory, Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Egidio D'Angelo
- Neurophysiology Unit, Neurocomputational Laboratory, Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
- IRCCS Mondino Foundation, Pavia, Italy
| | - Tadashi Yamazaki
- Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
| |
Collapse
|
6
|
Florimbi G, Torti E, Masoli S, D'Angelo E, Leporati F. Granular layEr Simulator: Design and Multi-GPU Simulation of the Cerebellar Granular Layer. Front Comput Neurosci 2021; 15:630795. [PMID: 33833674 PMCID: PMC8023391 DOI: 10.3389/fncom.2021.630795] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 02/17/2021] [Indexed: 11/15/2022] Open
Abstract
In modern computational modeling, neuroscientists need to reproduce long-lasting activity of large-scale networks, where neurons are described by highly complex mathematical models. These aspects strongly increase the computational load of the simulations, which can be efficiently performed by exploiting parallel systems to reduce the processing times. Graphics Processing Unit (GPU) devices meet this need providing on desktop High Performance Computing. In this work, authors describe a novel Granular layEr Simulator development implemented on a multi-GPU system capable of reconstructing the cerebellar granular layer in a 3D space and reproducing its neuronal activity. The reconstruction is characterized by a high level of novelty and realism considering axonal/dendritic field geometries, oriented in the 3D space, and following convergence/divergence rates provided in literature. Neurons are modeled using Hodgkin and Huxley representations. The network is validated by reproducing typical behaviors which are well-documented in the literature, such as the center-surround organization. The reconstruction of a network, whose volume is 600 × 150 × 1,200 μm3 with 432,000 granules, 972 Golgi cells, 32,399 glomeruli, and 4,051 mossy fibers, takes 235 s on an Intel i9 processor. The 10 s activity reproduction takes only 4.34 and 3.37 h exploiting a single and multi-GPU desktop system (with one or two NVIDIA RTX 2080 GPU, respectively). Moreover, the code takes only 3.52 and 2.44 h if run on one or two NVIDIA V100 GPU, respectively. The relevant speedups reached (up to ~38× in the single-GPU version, and ~55× in the multi-GPU) clearly demonstrate that the GPU technology is highly suitable for realistic large network simulations.
Collapse
Affiliation(s)
- Giordana Florimbi
- Custom Computing and Programmable Systems Laboratory, Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Emanuele Torti
- Custom Computing and Programmable Systems Laboratory, Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Stefano Masoli
- Neurocomputational Laboratory, Neurophysiology Unit, Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Egidio D'Angelo
- Neurocomputational Laboratory, Neurophysiology Unit, Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.,Istituti di Ricovero e Cura a Carattere Scientifico (IRCCS) Mondino Foundation, Pavia, Italy
| | - Francesco Leporati
- Custom Computing and Programmable Systems Laboratory, Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
7
|
Ekström S, Pilia M, Kullberg J, Ahlström H, Strand R, Malmberg F. Faster dense deformable image registration by utilizing both CPU and GPU. J Med Imaging (Bellingham) 2021; 8:014002. [PMID: 33542943 PMCID: PMC7849043 DOI: 10.1117/1.jmi.8.1.014002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 12/31/2020] [Indexed: 11/14/2022] Open
Abstract
Purpose: Image registration is an important aspect of medical image analysis and a key component in many analysis concepts. Applications include fusion of multimodal images, multi-atlas segmentation, and whole-body analysis. Deformable image registration is often computationally expensive, and the need for efficient registration methods is highlighted by the emergence of large-scale image databases, e.g., the UK Biobank, providing imaging from 100,000 participants. Approach: We present a heterogeneous computing approach, utilizing both the CPU and the graphics processing unit (GPU), to accelerate a previously proposed image registration method. The parallelizable task of computing the matching criterion is offloaded to the GPU, where it can be computed efficiently, while the more complex optimization task is performed on the CPU. To lessen the impact of data synchronization between the CPU and GPU, we propose a pipeline model, effectively overlapping computational tasks with data synchronization. The performance is evaluated on a brain labeling task and compared with a CPU implementation of the same method and the popular advanced normalization tools (ANTs) software. Results: The proposed method presents a speed-up by factors of 4 and 8 against the CPU implementation and the ANTs software, respectively. A significant improvement in labeling quality was also observed, with measured mean Dice overlaps of 0.712 and 0.701 for our method and ANTs, respectively. Conclusions: We showed that the proposed method compares favorably to the ANTs software yielding both a significant speed-up and an improvement in labeling quality. The registration method together with the proposed parallelization strategy is implemented as an open-source software package, deform.
Collapse
Affiliation(s)
- Simon Ekström
- Uppsala University, Section of Radiology, Department of Surgical Sciences, Uppsala, Sweden.,Antaros Medical, Mölndal, Sweden
| | - Martino Pilia
- Uppsala University, Section of Radiology, Department of Surgical Sciences, Uppsala, Sweden
| | - Joel Kullberg
- Uppsala University, Section of Radiology, Department of Surgical Sciences, Uppsala, Sweden.,Antaros Medical, Mölndal, Sweden
| | - Håkan Ahlström
- Uppsala University, Section of Radiology, Department of Surgical Sciences, Uppsala, Sweden.,Antaros Medical, Mölndal, Sweden
| | - Robin Strand
- Uppsala University, Section of Radiology, Department of Surgical Sciences, Uppsala, Sweden.,Uppsala University, Centre for Image Analysis, Division of Visual Information and Interaction, Department of Information Technology, Uppsala, Sweden
| | - Filip Malmberg
- Uppsala University, Section of Radiology, Department of Surgical Sciences, Uppsala, Sweden.,Uppsala University, Centre for Image Analysis, Division of Visual Information and Interaction, Department of Information Technology, Uppsala, Sweden
| |
Collapse
|
8
|
Williams-Young DB, de Jong WA, van Dam HJJ, Yang C. On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters. Front Chem 2020; 8:581058. [PMID: 33363105 PMCID: PMC7758429 DOI: 10.3389/fchem.2020.581058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 09/14/2020] [Indexed: 11/20/2022] Open
Abstract
The predominance of Kohn–Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high-performance computing (HPC). With recent trends in HPC leading toward increasing reliance on heterogeneous accelerator-based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high levels of performance that have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn–Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.
Collapse
Affiliation(s)
- David B Williams-Young
- Lawrence Berkeley National Laboratory, Computational Research Division, Berkeley, CA, United States
| | - Wibe A de Jong
- Lawrence Berkeley National Laboratory, Computational Research Division, Berkeley, CA, United States
| | - Hubertus J J van Dam
- Brookhaven National Laboratory, Computational Science Initiative, Upton, NY, United States
| | - Chao Yang
- Lawrence Berkeley National Laboratory, Computational Research Division, Berkeley, CA, United States
| |
Collapse
|
9
|
Rovere M, Chen Z, Di Pilato A, Pantaleo F, Seez C. CLUE: A Fast Parallel Clustering Algorithm for High Granularity Calorimeters in High-Energy Physics. Front Big Data 2020; 3:591315. [PMID: 33937749 PMCID: PMC8080903 DOI: 10.3389/fdata.2020.591315] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/25/2020] [Indexed: 12/02/2022] Open
Abstract
One of the challenges of high granularity calorimeters, such as that to be built to cover the endcap region in the CMS Phase-2 Upgrade for HL-LHC, is that the large number of channels causes a surge in the computing load when clustering numerous digitized energy deposits (hits) in the reconstruction stage. In this article, we propose a fast and fully parallelizable density-based clustering algorithm, optimized for high-occupancy scenarios, where the number of clusters is much larger than the average number of hits in a cluster. The algorithm uses a grid spatial index for fast querying of neighbors and its timing scales linearly with the number of hits within the range considered. We also show a comparison of the performance on CPU and GPU implementations, demonstrating the power of algorithmic parallelization in the coming era of heterogeneous computing in high-energy physics.
Collapse
Affiliation(s)
- Marco Rovere
- European Organization for Nuclear Research (CERN), Meyrin, Switzerland
| | - Ziheng Chen
- Northwestern University, Evanston, IL, United States
| | - Antonio Di Pilato
- University of Bari, Bari, Italy.,National Institute for Nuclear Physics (INFN)-Sezione di Bari, Bari, Italy
| | - Felice Pantaleo
- European Organization for Nuclear Research (CERN), Meyrin, Switzerland
| | - Chris Seez
- Imperial College London, South Kensington Campus, London, United Kingdom
| |
Collapse
|
10
|
Abstract
This study proposes an accurate method for creating a dictionary for magnetic resonance fingerprinting (MRF) using a fast Bloch image simulator. An MRF sequence based on a fast imaging with steady precession sequence and a numerical phantom were used for dictionary generation. Cartesian and spiral readout gradients were used for the Bloch image simulation. The validity and usefulness of the method for accurate dictionary creation were demonstrated by MRF parameter maps obtained by pattern matching with the dictionaries generated by the proposed method.
Collapse
|
11
|
Johnson TS, Li S, Franz E, Huang Z, Dan Li S, Campbell MJ, Huang K, Zhang Y. PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers. Gigascience 2019; 8:5480571. [PMID: 31029062 PMCID: PMC6486473 DOI: 10.1093/gigascience/giz046] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 12/13/2018] [Accepted: 03/29/2019] [Indexed: 12/14/2022] Open
Abstract
Background Long thought “relics” of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene–parent gene relationships without leveraging other homologous genes/pseudogenes. Results We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four “flavors” of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a “one stop shop” for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. Conclusions Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike.
Collapse
Affiliation(s)
- Travis S Johnson
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA.,Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA
| | - Sihong Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA
| | - Eric Franz
- Ohio Supercomputer Center, 1224 Kinnear Road, Columbus, OH 43212, USA
| | - Zhi Huang
- School of Electrical and Computer Engineering, Purdue University, 465 Northwestern Avenue, West Lafayette, IN 47907, USA.,Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA
| | - Shuyu Dan Li
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Moray J Campbell
- Division of Pharmaceutics and Pharmaceutical Chemistry, College of Pharmacy, The Ohio State University, 500 West 12 th Avenue, Columbus, OH 43210, USA
| | - Kun Huang
- Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA.,Regenstrief Institute, Indiana University, 1101 West 10 th Street, Indianapolis, IN 46262, USA
| | - Yan Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA.,The Ohio State University Comprehensive Cancer Center (OSUCCC - James), 460 West 10 th Avenue, Columbus, OH 43210, USA
| |
Collapse
|
12
|
Bittremieux W, Laukens K, Noble WS. Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units. J Proteome Res 2019; 18:3792-3799. [PMID: 31448616 PMCID: PMC6886738 DOI: 10.1021/acs.jproteome.9b00291] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Open modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides. We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. On the basis of these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo .
Collapse
Affiliation(s)
- Wout Bittremieux
- Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium
- Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - Kris Laukens
- Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium
- Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
| | - William Stafford Noble
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
- Department of Computer Science and Engineering , University of Washington , Seattle , Washington 98195 , United States
| |
Collapse
|
13
|
Peng B, Luo S, Xu Z, Jiang J. Accelerating 3-D GPU-based Motion Tracking for Ultrasound Strain Elastography Using Sum-Tables: Analysis and Initial Results. Appl Sci (Basel) 2019; 9:1991. [PMID: 31372306 DOI: 10.3390/app9101991] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Now, with the availability of 3-D ultrasound data, a lot of research efforts are being devoted to developing 3-D ultrasound strain elastography (USE) systems. Because 3-D motion tracking, a core component in any 3-D USE system, is computationally intensive, a lot of efforts are under way to accelerate 3-D motion tracking. In the literature, the concept of Sum-Table has been used in a serial computing environment to reduce the burden of computing signal correlation, which is the single most computationally intensive component in 3-D motion tracking. In this study, parallel programming using graphics processing units (GPU) is used in conjunction with the concept of Sum-Table to improve the computational efficiency of 3-D motion tracking. To our knowledge, sum-tables have not been used in a GPU environment for 3-D motion tracking. Our main objective here is to investigate the feasibility of using sum-table-based normalized correlation coefficient (ST-NCC) method for the above-mentioned GPU-accelerated 3-D USE. More specifically, two different implementations of ST-NCC methods proposed by Lewis et al. and Luo-Konofagou are compared against each other. During the performance comparison, the conventional method for calculating the normalized correlation coefficient (NCC) was used as the baseline. All three methods were implemented using compute unified device architecture (CUDA; Version 9.0, Nvidia Inc., CA, USA) and tested on a professional GeForce GTX TITAN X card (Nvidia Inc., CA, USA). Using 3-D ultrasound data acquired during a tissue-mimicking phantom experiment, both displacement tracking accuracy and computational efficiency were evaluated for the above-mentioned three different methods. Based on data investigated, we found that under the GPU platform, Lou-Konofaguo method can still improve the computational efficiency (17–46%), as compared to the classic NCC method implemented into the same GPU platform. However, the Lewis method does not improve the computational efficiency in some configuration or improves the computational efficiency at a lower rate (7–23%) under the GPU parallel computing environment. Comparable displacement tracking accuracy was obtained by both methods.
Collapse
|
14
|
van Elteren A, Bédorf J, Portegies Zwart S. Multi-scale high-performance computing in astrophysics: simulating clusters with stars, binaries and planets. Philos Trans A Math Phys Eng Sci 2019; 377:20180153. [PMID: 30967037 PMCID: PMC6388014 DOI: 10.1098/rsta.2018.0153] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
The demand on simulation software in astrophysics has dramatically increased over the last decades. This increase is driven by improvements in observational data and computer hardware. At the same time, computers have become more complicated to program due to the introduction of more parallelism and hybrid hardware. To keep up with these developments, much of the software has to be redesigned. In order to prevent the future need to rewrite again when new developments present themselves, the main effort should go into making the software maintainable, flexible and scalable. In this paper, we explain our strategy for coupling elementary solvers and how to combine them into a high-performance multi-scale environment in which complex simulations can be performed. The elementary parts can remain succinct while supporting the aggregation to more satisfactory functionality by coupling them on a higher level. The advanced code-coupling strategies we present here allow such a hierarchy and support the development of complex codes. A library of simple elementary solvers subsequently stimulates the rapid development of more complex code that can co-evolve with the latest advances in computer hardware. We demonstrate how to combine several of these elementary solvers in a hierarchical and generic system, and how the resulting complex codes can be applied to multi-scale problems in astrophysics. Our aim is to achieve the best of several worlds with respect to performance, flexibility and maintainability while reducing development time. We succeeded in the development of the hierarchical coupling strategy and the general framework, but a comprehensive library of minimal fundamental-physics solvers is still unavailable. This article is part of the theme issue 'Multiscale modelling, simulation and computing: from the desktop to the exascale'.
Collapse
|
15
|
Landau W, Niemi J, Nettleton D. Fully Bayesian analysis of RNA-seq counts for the detection of gene expression heterosis. J Am Stat Assoc 2018; 114:610-621. [PMID: 31354180 PMCID: PMC6660196 DOI: 10.1080/01621459.2018.1497496] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 01/01/2018] [Indexed: 01/17/2023]
Abstract
Heterosis, or hybrid vigor, is the enhancement of the phenotype of hybrid progeny relative to their inbred parents. Heterosis is extensively used in agriculture, and the underlying mechanisms are unclear. To investigate the molecular basis of phenotypic heterosis, researchers search tens of thousands of genes for heterosis with respect to expression in the transcriptome. Difficulty arises in the assessment of heterosis due to composite null hypotheses and non-uniform distributions for p-values under these null hypotheses. Thus, we develop a general hierarchical model for count data and a fully Bayesian analysis in which an efficient parallelized Markov chain Monte Carlo algorithm ameliorates the computational burden. We use our method to detect gene expression heterosis in a two-hybrid plant-breeding scenario, both in a real RNA-seq maize dataset and in simulation studies. In the simulation studies, we show our method has well-calibrated posterior probabilities and credible intervals when the model assumed in analysis matches the model used to simulate the data. Although model misspecification can adversely affect calibration, the methodology is still able to accurately rank genes. Finally, we show that hyperparameter posteriors are extremely narrow and an empirical Bayes (eBayes) approach based on posterior means from the fully Bayesian analysis provides virtually equivalent posterior probabilities, credible intervals, and gene rankings relative to the fully Bayesian solution. This evidence of equivalence provides support for the use of eBayes procedures in RNA-seq data analysis if accurate hyperparameter estimates can be obtained.
Collapse
Affiliation(s)
- Will Landau
- Department of Statistics, Iowa State University
| | - Jarad Niemi
- Department of Statistics, Iowa State University
| | | |
Collapse
|
16
|
van Vreumingen D, Tewari S, Verbeek F, van Ruitenbeek JM. Towards Controlled Single-Molecule Manipulation Using "Real-Time" Molecular Dynamics Simulation: A GPU Implementation. Micromachines (Basel) 2018; 9:E270. [PMID: 30424203 PMCID: PMC6187332 DOI: 10.3390/mi9060270] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Revised: 05/24/2018] [Accepted: 05/25/2018] [Indexed: 02/04/2023]
Abstract
Molecular electronics saw its birth with the idea to build electronic circuitry with single molecules as individual components. Even though commercial applications are still modest, it has served an important part in the study of fundamental physics at the scale of single atoms and molecules. It is now a routine procedure in many research groups around the world to connect a single molecule between two metallic leads. What is unknown is the nature of this coupling between the molecule and the leads. We have demonstrated recently (Tewari, 2018, Ph.D. Thesis) our new setup based on a scanning tunneling microscope, which can be used to controllably manipulate single molecules and atomic chains. In this article, we will present the extension of our molecular dynamic simulator attached to this system for the manipulation of single molecules in real time using a graphics processing unit (GPU). This will not only aid in controlled lift-off of single molecules, but will also provide details about changes in the molecular conformations during the manipulation. This information could serve as important input for theoretical models and for bridging the gap between the theory and experiments.
Collapse
Affiliation(s)
- Dyon van Vreumingen
- Huygens-Kamerlingh Onnes Laboratorium, Universiteit Leiden, 2333CA Leiden, The Netherlands.
- Leiden Insitute of Advanced Computer Science, Universiteit Leiden, 2333CA Leiden, The Netherlands.
| | - Sumit Tewari
- Huygens-Kamerlingh Onnes Laboratorium, Universiteit Leiden, 2333CA Leiden, The Netherlands.
| | - Fons Verbeek
- Leiden Insitute of Advanced Computer Science, Universiteit Leiden, 2333CA Leiden, The Netherlands.
| | - Jan M van Ruitenbeek
- Huygens-Kamerlingh Onnes Laboratorium, Universiteit Leiden, 2333CA Leiden, The Netherlands.
| |
Collapse
|
17
|
Hu S, Zhang Q, Wang J, Chen Z. Real-time particle filtering and smoothing algorithms for detecting abrupt changes in neural ensemble spike activity. J Neurophysiol 2017; 119:1394-1410. [PMID: 29357468 DOI: 10.1152/jn.00684.2017] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Sequential change-point detection from time series data is a common problem in many neuroscience applications, such as seizure detection, anomaly detection, and pain detection. In our previous work (Chen Z, Zhang Q, Tong AP, Manders TR, Wang J. J Neural Eng 14: 036023, 2017), we developed a latent state-space model, known as the Poisson linear dynamical system, for detecting abrupt changes in neuronal ensemble spike activity. In online brain-machine interface (BMI) applications, a recursive filtering algorithm is used to track the changes in the latent variable. However, previous methods have been restricted to Gaussian dynamical noise and have used Gaussian approximation for the Poisson likelihood. To improve the detection speed, we introduce non-Gaussian dynamical noise for modeling a stochastic jump process in the latent state space. To efficiently estimate the state posterior that accommodates non-Gaussian noise and non-Gaussian likelihood, we propose particle filtering and smoothing algorithms for the change-point detection problem. To speed up the computation, we implement the proposed particle filtering algorithms using advanced graphics processing unit computing technology. We validate our algorithms, using both computer simulations and experimental data for acute pain detection. Finally, we discuss several important practical issues in the context of real-time closed-loop BMI applications. NEW & NOTEWORTHY Sequential change-point detection is an important problem in closed-loop neuroscience experiments. This study proposes novel sequential Monte Carlo methods to quickly detect the onset and offset of a stochastic jump process that drives the population spike activity. This new approach is robust with respect to spike sorting noise and varying levels of signal-to-noise ratio. The GPU implementation of the computational algorithm allows for parallel processing in real time.
Collapse
Affiliation(s)
- Sile Hu
- Department of Instrument Science and Technology, Zhejiang University , Hangzhou, Zhejiang , People's Republic of China.,Department of Psychiatry, New York University School of Medicine , New York, New York
| | - Qiaosheng Zhang
- Department of Anesthesiology, Perioperative Care, and Pain Medicine, New York University School of Medicine , New York, New York
| | - Jing Wang
- Department of Anesthesiology, Perioperative Care, and Pain Medicine, New York University School of Medicine , New York, New York.,Department of Neuroscience and Physiology, New York University School of Medicine , New York, New York
| | - Zhe Chen
- Department of Psychiatry, New York University School of Medicine , New York, New York.,Department of Neuroscience and Physiology, New York University School of Medicine , New York, New York
| |
Collapse
|
18
|
Mei G, Xu L, Xu N. Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit. R Soc Open Sci 2017; 4:170436. [PMID: 28989754 PMCID: PMC5627094 DOI: 10.1098/rsos.170436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Accepted: 08/21/2017] [Indexed: 06/07/2023]
Abstract
This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.
Collapse
Affiliation(s)
- Gang Mei
- Author for correspondence: Gang Mei e-mail:
| | | | | |
Collapse
|
19
|
Kobayashi C, Jung J, Matsunaga Y, Mori T, Ando T, Tamura K, Kamiya M, Sugita Y. GENESIS 1.1: A hybrid-parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms. J Comput Chem 2017; 38:2193-2206. [PMID: 28718930 DOI: 10.1002/jcc.24874] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 06/08/2017] [Accepted: 06/09/2017] [Indexed: 01/09/2023]
Abstract
GENeralized-Ensemble SImulation System (GENESIS) is a software package for molecular dynamics (MD) simulation of biological systems. It is designed to extend limitations in system size and accessible time scale by adopting highly parallelized schemes and enhanced conformational sampling algorithms. In this new version, GENESIS 1.1, new functions and advanced algorithms have been added. The all-atom and coarse-grained potential energy functions used in AMBER and GROMACS packages now become available in addition to CHARMM energy functions. The performance of MD simulations has been greatly improved by further optimization, multiple time-step integration, and hybrid (CPU + GPU) computing. The string method and replica-exchange umbrella sampling with flexible collective variable choice are used for finding the minimum free-energy pathway and obtaining free-energy profiles for conformational changes of a macromolecule. These new features increase the usefulness and power of GENESIS for modeling and simulation in biological research. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Chigusa Kobayashi
- Computational Biophysics Research Team, RIKEN Advanced Institute for Computational Science, 7-1-26 Minatojima-minamachi, Chuo-ku, Kobe, 650-0047, Japan
| | - Jaewoon Jung
- Computational Biophysics Research Team, RIKEN Advanced Institute for Computational Science, 7-1-26 Minatojima-minamachi, Chuo-ku, Kobe, 650-0047, Japan.,Theoretical Molecular Science Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Yasuhiro Matsunaga
- Computational Biophysics Research Team, RIKEN Advanced Institute for Computational Science, 7-1-26 Minatojima-minamachi, Chuo-ku, Kobe, 650-0047, Japan.,JST PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan
| | - Takaharu Mori
- Theoretical Molecular Science Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Tadashi Ando
- Laboratory for Biomolecular Function Simulation, RIKEN Quantitative Biology Center Computational Biology Research Core, 1-6-5 Minatojima-minamachi, Chuo-ku, Kobe, 650-0047, Japan.,Department of Applied Electronics, Faculty of Industrial Science and Technology, Tokyo University of Science, 6-3-1 Niijuku, Katsushika-ku, Tokyo, 125-8585, Japan.,Water Frontier Science and Technology Research Center, Research Institute for Science and Technology, Tokyo University of Science, 6-3-1 Niijuku, Katsushika-ku, Tokyo, 125-8585, Japan.,Research Division of Multiscale Interfacial Thermofluid Dynamics, Research Institute for Science and Technology, Tokyo University of Science, 6-3-1 Niijuku, Katsushika-ku, Tokyo, 125-8585, Japan
| | - Koichi Tamura
- Computational Biophysics Research Team, RIKEN Advanced Institute for Computational Science, 7-1-26 Minatojima-minamachi, Chuo-ku, Kobe, 650-0047, Japan
| | - Motoshi Kamiya
- Computational Biophysics Research Team, RIKEN Advanced Institute for Computational Science, 7-1-26 Minatojima-minamachi, Chuo-ku, Kobe, 650-0047, Japan
| | - Yuji Sugita
- Computational Biophysics Research Team, RIKEN Advanced Institute for Computational Science, 7-1-26 Minatojima-minamachi, Chuo-ku, Kobe, 650-0047, Japan.,Theoretical Molecular Science Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama, 351-0198, Japan.,Laboratory for Biomolecular Function Simulation, RIKEN Quantitative Biology Center Computational Biology Research Core, 1-6-5 Minatojima-minamachi, Chuo-ku, Kobe, 650-0047, Japan
| |
Collapse
|
20
|
Chang CH, Yu X, Ji JX. Compressed sensing MRI reconstruction from 3D multichannel data using GPUs. Magn Reson Med 2017; 78:2265-2274. [PMID: 28198568 DOI: 10.1002/mrm.26636] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 01/01/2017] [Accepted: 01/18/2017] [Indexed: 11/08/2022]
Abstract
PURPOSE To accelerate iterative reconstructions of compressed sensing (CS) MRI from 3D multichannel data using graphics processing units (GPUs). METHODS The sparsity of MRI signals and parallel array receivers can reduce the data acquisition requirements. However, iterative CS reconstructions from data acquired using an array system may take a significantly long time, especially for a large number of parallel channels. This paper presents an efficient method for CS-MRI reconstruction from 3D multichannel data using GPUs. In this method, CS reconstructions were simultaneously processed in a channel-by-channel fashion on the GPU, in which the computations of multiple-channel 3D-CS reconstructions are highly parallelized. The final image was then produced by a sum-of-squares method on the central processing unit. Implementation details including algorithm, data/memory management, and parallelization schemes are reported in the paper. RESULTS Both simulated data and in vivo MRI array data were tested. The results showed that the proposed method can significantly improve the image reconstruction efficiency, typically shortening the runtime by a factor of 30. CONCLUSIONS Using low-cost GPUs and an efficient algorithm allowed the 3D multislice compressive-sensing reconstruction to be performed in less than 1 s. The rapid reconstructions are expected to help bring high-dimensional, multichannel parallel CS MRI closer to clinical applications. Magn Reson Med 78:2265-2274, 2017. © 2017 International Society for Magnetic Resonance in Medicine.
Collapse
Affiliation(s)
- Ching-Hua Chang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, USA
| | - Xiangdong Yu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, USA
| | - Jim X Ji
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
21
|
Techavipoo U, Worasawate D, Boonleelakul W, Keinprasit R, Sunpetchniyom T, Sugino N, Thajchayapong P. Toward Optimal Computation of Ultrasound Image Reconstruction Using CPU and GPU. Sensors (Basel) 2016; 16:E1986. [PMID: 27886149 DOI: 10.3390/s16121986] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 10/31/2016] [Accepted: 11/10/2016] [Indexed: 12/03/2022]
Abstract
An ultrasound image is reconstructed from echo signals received by array elements of a transducer. The time of flight of the echo depends on the distance between the focus to the array elements. The received echo signals have to be delayed to make their wave fronts and phase coherent before summing the signals. In digital beamforming, the delays are not always located at the sampled points. Generally, the values of the delayed signals are estimated by the values of the nearest samples. This method is fast and easy, however inaccurate. There are other methods available for increasing the accuracy of the delayed signals and, consequently, the quality of the beamformed signals; for example, the in-phase (I)/quadrature (Q) interpolation, which is more time consuming but provides more accurate values than the nearest samples. This paper compares the signals after dynamic receive beamforming, in which the echo signals are delayed using two methods, the nearest sample method and the I/Q interpolation method. The comparisons of the visual qualities of the reconstructed images and the qualities of the beamformed signals are reported. Moreover, the computational speeds of these methods are also optimized by reorganizing the data processing flow and by applying the graphics processing unit (GPU). The use of single and double precision floating-point formats of the intermediate data is also considered. The speeds with and without these optimizations are also compared.
Collapse
|
22
|
Park S, McNutt T, Plishker W, Quon H, Wong J, Shekhar R, Lee J. Technical Note: scuda: A software platform for cumulative dose assessment. Med Phys 2016; 43:5339. [PMID: 27782691 PMCID: PMC5018004 DOI: 10.1118/1.4961985] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 07/10/2016] [Accepted: 08/19/2016] [Indexed: 11/07/2022] Open
Abstract
PURPOSE Accurate tracking of anatomical changes and computation of actually delivered dose to the patient are critical for successful adaptive radiation therapy (ART). Additionally, efficient data management and fast processing are practically important for the adoption in clinic as ART involves a large amount of image and treatment data. The purpose of this study was to develop an accurate and efficient Software platform for CUmulative Dose Assessment (scuda) that can be seamlessly integrated into the clinical workflow. METHODS scuda consists of deformable image registration (DIR), segmentation, dose computation modules, and a graphical user interface. It is connected to our image PACS and radiotherapy informatics databases from which it automatically queries/retrieves patient images, radiotherapy plan, beam data, and daily treatment information, thus providing an efficient and unified workflow. For accurate registration of the planning CT and daily CBCTs, the authors iteratively correct CBCT intensities by matching local intensity histograms during the DIR process. Contours of the target tumor and critical structures are then propagated from the planning CT to daily CBCTs using the computed deformations. The actual delivered daily dose is computed using the registered CT and patient setup information by a superposition/convolution algorithm, and accumulated using the computed deformation fields. Both DIR and dose computation modules are accelerated by a graphics processing unit. RESULTS The cumulative dose computation process has been validated on 30 head and neck (HN) cancer cases, showing 3.5 ± 5.0 Gy (mean±STD) absolute mean dose differences between the planned and the actually delivered doses in the parotid glands. On average, DIR, dose computation, and segmentation take 20 s/fraction and 17 min for a 35-fraction treatment including additional computation for dose accumulation. CONCLUSIONS The authors developed a unified software platform that provides accurate and efficient monitoring of anatomical changes and computation of actually delivered dose to the patient, thus realizing an efficient cumulative dose computation workflow. Evaluation on HN cases demonstrated the utility of our platform for monitoring the treatment quality and detecting significant dosimetric variations that are keys to successful ART.
Collapse
Affiliation(s)
- Seyoun Park
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland 21231
| | - Todd McNutt
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland 21231
| | | | - Harry Quon
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland 21231
| | - John Wong
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland 21231
| | - Raj Shekhar
- IGI Technologies, Inc., College Park, Maryland 20742 and Sheikh Zayed Institute for Pediatric Surgical Innovation, Children's National Health System, Washington, DC 20010
| | - Junghoon Lee
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland 21231
| |
Collapse
|
23
|
Choi S, Kwon OK, Kim J, Kim WY. Performance of heterogeneous computing with graphics processing unit and many integrated core for hartree potential calculations on a numerical grid. J Comput Chem 2016; 37:2193-201. [PMID: 27431905 DOI: 10.1002/jcc.24443] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Revised: 05/19/2016] [Accepted: 06/13/2016] [Indexed: 12/17/2022]
Abstract
We investigated the performance of heterogeneous computing with graphics processing units (GPUs) and many integrated core (MIC) with 20 CPU cores (20×CPU). As a practical example toward large scale electronic structure calculations using grid-based methods, we evaluated the Hartree potentials of silver nanoparticles with various sizes (3.1, 3.7, 4.9, 6.1, and 6.9 nm) via a direct integral method supported by the sinc basis set. The so-called work stealing scheduler was used for efficient heterogeneous computing via the balanced dynamic distribution of workloads between all processors on a given architecture without any prior information on their individual performances. 20×CPU + 1GPU was up to ∼1.5 and ∼3.1 times faster than 1GPU and 20×CPU, respectively. 20×CPU + 2GPU was ∼4.3 times faster than 20×CPU. The performance enhancement by CPU + MIC was considerably lower than expected because of the large initialization overhead of MIC, although its theoretical performance is similar with that of CPU + GPU. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Sunghwan Choi
- Department of Chemistry, KAIST, 291 Daehak-Ro, Yuseong-Gu, Daejeon, 34141, Republic of Korea.,Supercomputing Service Center, Korea Institute of Science and Technology Information, 245 Daehak-Ro, Yuseong-Gu, Daejeon, 34141, Republic of Korea
| | - Oh-Kyoung Kwon
- Supercomputing Service Center, Korea Institute of Science and Technology Information, 245 Daehak-Ro, Yuseong-Gu, Daejeon, 34141, Republic of Korea.,School of Computing, KAIST, 291 Daehak-Ro, Yuseong-Gu, Daejeon, 34141, Republic of Korea
| | - Jaewook Kim
- Department of Chemistry, KAIST, 291 Daehak-Ro, Yuseong-Gu, Daejeon, 34141, Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST, 291 Daehak-Ro, Yuseong-Gu, Daejeon, 34141, Republic of Korea
| |
Collapse
|
24
|
Abstract
We report development of a large-scale spiking network model of the cerebellum composed of more than 1 million neurons. The model is implemented on graphics processing units (GPUs), which are dedicated hardware for parallel computing. Using 4 GPUs simultaneously, we achieve realtime simulation, in which computer simulation of cerebellar activity for 1 s completes within 1 s in the real-world time, with temporal resolution of 1 ms. This allows us to carry out a very long-term computer simulation of cerebellar activity in a practical time with millisecond temporal resolution. Using the model, we carry out computer simulation of long-term gain adaptation of optokinetic response (OKR) eye movements for 5 days aimed to study the neural mechanisms of posttraining memory consolidation. The simulation results are consistent with animal experiments and our theory of posttraining memory consolidation. These results suggest that realtime computing provides a useful means to study a very slow neural process such as memory consolidation in the brain.
Collapse
Affiliation(s)
- Masato Gosui
- Department of Communication Engineering and Informatics, Graduate School of Informatics and Engineering, The University of Electro-CommunicationsTokyo, Japan
| | - Tadashi Yamazaki
- Department of Communication Engineering and Informatics, Graduate School of Informatics and Engineering, The University of Electro-CommunicationsTokyo, Japan
- Neuroinformatics Japan Center, RIKEN Brain Science InstituteSaitama, Japan
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and TechnologyIbaraki, Japan
| |
Collapse
|
25
|
Liquet B, Bottolo L, Campanella G, Richardson S, Chadeau-Hyam M. R2GUESS: A Graphics Processing Unit-Based R Package for Bayesian Variable Selection Regression of Multivariate Responses. J Stat Softw 2016; 69. [PMID: 29568242 DOI: 10.18637/jss.v069.i02] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Technological advances in molecular biology over the past decade have given rise to high dimensional and complex datasets offering the possibility to investigate biological associations between a range of genomic features and complex phenotypes. The analysis of this novel type of data generated unprecedented computational challenges which ultimately led to the definition and implementation of computationally efficient statistical models that were able to scale to genome-wide data, including Bayesian variable selection approaches. While extensive methodological work has been carried out in this area, only few methods capable of handling hundreds of thousands of predictors were implemented and distributed. Among these we recently proposed GUESS, a computationally optimised algorithm making use of graphics processing unit capabilities, which can accommodate multiple outcomes. In this paper we propose R2GUESS, an R package wrapping the original C++ source code. In addition to providing a user-friendly interface of the original code automating its parametrisation, and data handling, R2GUESS also incorporates many features to explore the data, to extend statistical inferences from the native algorithm (e.g., effect size estimation, significance assessment), and to visualize outputs from the algorithm. We first detail the model and its parametrisation, and describe in details its optimised implementation. Based on two examples we finally illustrate its statistical performances and flexibility.
Collapse
Affiliation(s)
- Benoît Liquet
- Laboratoire de Mathématiques et de leurs Applications, Université de Pau et des Pays de l'Adour, UMR CNRS 5142, Pau, France; ARC Centre of Excellence for Mathematical and Statistical Frontiers, Queensland University of Technology (QUT), Brisbane, Australia
| | | | | | | | - Marc Chadeau-Hyam
- Department of Epidemiology and Biostatistics, Imperial College London, St Mary's Hospital, Norfolk Place, London, W21PG, United Kingdom
| |
Collapse
|
26
|
da Silva J, Ansorge R, Jena R. Fast Pencil Beam Dose Calculation for Proton Therapy Using a Double-Gaussian Beam Model. Front Oncol 2015; 5:281. [PMID: 26734567 PMCID: PMC4683172 DOI: 10.3389/fonc.2015.00281] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 11/30/2015] [Indexed: 11/15/2022] Open
Abstract
The highly conformal dose distributions produced by scanned proton pencil beams (PBs) are more sensitive to motion and anatomical changes than those produced by conventional radiotherapy. The ability to calculate the dose in real-time as it is being delivered would enable, for example, online dose monitoring, and is therefore highly desirable. We have previously described an implementation of a PB algorithm running on graphics processing units (GPUs) intended specifically for online dose calculation. Here, we present an extension to the dose calculation engine employing a double-Gaussian beam model to better account for the low-dose halo. To the best of our knowledge, it is the first such PB algorithm for proton therapy running on a GPU. We employ two different parameterizations for the halo dose, one describing the distribution of secondary particles from nuclear interactions found in the literature and one relying on directly fitting the model to Monte Carlo simulations of PBs in water. Despite the large width of the halo contribution, we show how in either case the second Gaussian can be included while prolonging the calculation of the investigated plans by no more than 16%, or the calculation of the most time-consuming energy layers by about 25%. Furthermore, the calculation time is relatively unaffected by the parameterization used, which suggests that these results should hold also for different systems. Finally, since the implementation is based on an algorithm employed by a commercial treatment planning system, it is expected that with adequate tuning, it should be able to reproduce the halo dose from a general beam line with sufficient accuracy.
Collapse
Affiliation(s)
- Joakim da Silva
- Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, UK; Department of Oncology, University of Cambridge, Cambridge, UK
| | - Richard Ansorge
- Cavendish Laboratory, Department of Physics, University of Cambridge , Cambridge , UK
| | - Rajesh Jena
- Department of Oncology, University of Cambridge , Cambridge , UK
| |
Collapse
|
27
|
Ou SC, Cui D, Wezowicz M, Taufer M, Patel S. Free energetics of carbon nanotube association in aqueous inorganic NaI salt solutions: Temperature effects using all-atom molecular dynamics simulations. J Comput Chem 2015; 36:1196-212. [PMID: 25868455 PMCID: PMC4445429 DOI: 10.1002/jcc.23906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Revised: 01/22/2015] [Accepted: 02/21/2015] [Indexed: 11/06/2022]
Abstract
In this study, we examine the temperature dependence of free energetics of nanotube association using graphical processing unit-enabled all-atom molecular dynamics simulations (FEN ZI) with two (10,10) single-walled carbon nanotubes in 3 m NaI aqueous salt solution. Results suggest that the free energy, enthalpy and entropy changes for the association process are all reduced at the high temperature, in agreement with previous investigations using other hydrophobes. Via the decomposition of free energy into individual components, we found that solvent contribution (including water, anion, and cation contributions) is correlated with the spatial distribution of the corresponding species and is influenced distinctly by the temperature. We studied the spatial distribution and the structure of the solvent in different regions: intertube, intratube and the bulk solvent. By calculating the fluctuation of coarse-grained tube-solvent surfaces, we found that tube-water interfacial fluctuation exhibits the strongest temperature dependence. By taking ions to be a solvent-like medium in the absence of water, tube-anion interfacial fluctuation shows similar but weaker dependence on temperature, while tube-cation interfacial fluctuation shows no dependence in general. These characteristics are discussed via the malleability of their corresponding solvation shells relative to the nanotube surface. Hydrogen bonding profiles and tetrahedrality of water arrangement are also computed to compare the structure of solvent in the solvent bulk and intertube region. The hydrophobic confinement induces a relatively lower concentration environment in the intertube region, therefore causing different intertube solvent structures which depend on the tube separation. This study is relevant in the continuing discourse on hydrophobic interactions (as they impact generally a broad class of phenomena in biology, biochemistry, and materials science and soft condensed matter research), and interpretations of hydrophobicity in terms of alternative but parallel signatures such as interfacial fluctuations, dewetting transitions, and enhanced fluctuation probabilities at interfaces.
Collapse
Affiliation(s)
- Shu-Ching Ou
- Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716, USA
| | - Di Cui
- Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716, USA
| | - Matthew Wezowicz
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware 19716, USA
| | - Michela Taufer
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware 19716, USA
| | - Sandeep Patel
- Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716, USA
| |
Collapse
|
28
|
Sawaya NPD, Huh J, Fujita T, Saikin SK, Aspuru-Guzik A. Fast delocalization leads to robust long-range excitonic transfer in a large quantum chlorosome model. Nano Lett 2015; 15:1722-1729. [PMID: 25694170 DOI: 10.1021/nl504399d] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Chlorosomes are efficient light-harvesting antennas containing up to hundreds of thousands of bacteriochlorophyll molecules. With massively parallel computer hardware, we use a nonperturbative stochastic Schrödinger equation, while including an atomistically derived spectral density, to study excitonic energy transfer in a realistically sized chlorosome model. We find that fast short-range delocalization leads to robust long-range transfer due to the antennae's concentric-roll structure. Additionally, we discover anomalous behavior arising from different initial conditions, and outline general considerations for simulating excitonic systems on the nanometer to micrometer scale.
Collapse
Affiliation(s)
- Nicolas P D Sawaya
- Department of Chemistry and Chemical Biology, Harvard University , 12 Oxford Street, Cambridge, Massachusetts 02138, United States
| | | | | | | | | |
Collapse
|
29
|
Lee S, Kwon MS, Park T. CARAT-GxG: CUDA-Accelerated Regression Analysis Toolkit for Large-Scale Gene-Gene Interaction with GPU Computing System. Cancer Inform 2015; 13:27-33. [PMID: 25574130 PMCID: PMC4263399 DOI: 10.4137/cin.s16349] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 10/13/2014] [Accepted: 10/14/2014] [Indexed: 11/25/2022] Open
Abstract
In genome-wide association studies (GWAS), regression analysis has been most commonly used to establish an association between a phenotype and genetic variants, such as single nucleotide polymorphism (SNP). However, most applications of regression analysis have been restricted to the investigation of single marker because of the large computational burden. Thus, there have been limited applications of regression analysis to multiple SNPs, including gene–gene interaction (GGI) in large-scale GWAS data. In order to overcome this limitation, we propose CARAT-GxG, a GPU computing system-oriented toolkit, for performing regression analysis with GGI using CUDA (compute unified device architecture). Compared to other methods, CARAT-GxG achieved almost 700-fold execution speed and delivered highly reliable results through our GPU-specific optimization techniques. In addition, it was possible to achieve almost-linear speed acceleration with the application of a GPU computing system, which is implemented by the TORQUE Resource Manager. We expect that CARAT-GxG will enable large-scale regression analysis with GGI for GWAS data.
Collapse
Affiliation(s)
- Sungyoung Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Min-Seok Kwon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea. ; Department of Statistics, Seoul National University, Seoul, South Korea
| |
Collapse
|
30
|
Chen TW, Henke M, de Visser PHB, Buck-Sorlin G, Wiechers D, Kahlen K, Stützel H. What is the most prominent factor limiting photosynthesis in different layers of a greenhouse cucumber canopy? Ann Bot 2014; 114:677-88. [PMID: 24907313 PMCID: PMC4217677 DOI: 10.1093/aob/mcu100] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/10/2014] [Indexed: 05/06/2023]
Abstract
BACKGROUND AND AIMS Maximizing photosynthesis at the canopy level is important for enhancing crop yield, and this requires insights into the limiting factors of photosynthesis. Using greenhouse cucumber (Cucumis sativus) as an example, this study provides a novel approach to quantify different components of photosynthetic limitations at the leaf level and to upscale these limitations to different canopy layers and the whole plant. METHODS A static virtual three-dimensional canopy structure was constructed using digitized plant data in GroIMP. Light interception of the leaves was simulated by a ray-tracer and used to compute leaf photosynthesis. Different components of photosynthetic limitations, namely stomatal (S(L)), mesophyll (M(L)), biochemical (B(L)) and light (L(L)) limitations, were calculated by a quantitative limitation analysis of photosynthesis under different light regimes. KEY RESULTS In the virtual cucumber canopy, B(L) and L(L) were the most prominent factors limiting whole-plant photosynthesis. Diffusional limitations (S(L) + M(L)) contributed <15% to total limitation. Photosynthesis in the lower canopy was more limited by the biochemical capacity, and the upper canopy was more sensitive to light than other canopy parts. Although leaves in the upper canopy received more light, their photosynthesis was more light restricted than in the leaves of the lower canopy, especially when the light condition above the canopy was poor. An increase in whole-plant photosynthesis under diffuse light did not result from an improvement of light use efficiency but from an increase in light interception. Diffuse light increased the photosynthesis of leaves that were directly shaded by other leaves in the canopy by up to 55%. CONCLUSIONS Based on the results, maintaining biochemical capacity of the middle-lower canopy and increasing the leaf area of the upper canopy would be promising strategies to improve canopy photosynthesis in a high-wire cucumber cropping system. Further analyses using the approach described in this study can be expected to provide insights into the influences of horticultural practices on canopy photosynthesis and the design of optimal crop canopies.
Collapse
Affiliation(s)
- Tsu-Wei Chen
- Institute of Horticultural Production Systems, Leibniz Universität Hannover, Herrenhäuser Straße 2, D-30419 Hannover, Germany
| | - Michael Henke
- Department of Ecoinformatics, Biometrics and Forest Growth, Georg-August University of Gö ttingen, Gö ttingen, Germany
| | - Pieter H. B. de Visser
- Greenhouse Horticulture, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Gerhard Buck-Sorlin
- UMR1345 Institut de Recherche en Horticulture et Semences (IRHS), AGROCAMPUS OUEST, Centre d'Angers, 2 rue André le Nôtre 2, 49045 Angers Cedex 01, France
| | | | - Katrin Kahlen
- Geisenheim University, Von-Lade-Straße 1, D-65366 Geisenheim, Germany
| | - Hartmut Stützel
- Institute of Horticultural Production Systems, Leibniz Universität Hannover, Herrenhäuser Straße 2, D-30419 Hannover, Germany
| |
Collapse
|
31
|
Abstract
Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes; in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains computational density. Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers.
Collapse
Affiliation(s)
- Zhisong Fu
- The Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112
| | - Robert M. Kirby
- The Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112
| | - Ross T. Whitaker
- The Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112
| |
Collapse
|
32
|
Gao H, Phan L, Lin Y. Parallel multigrid solver of radiative transfer equation for photon transport via graphics processing unit. J Biomed Opt 2012; 17:96004-1. [PMID: 23085905 PMCID: PMC3497889 DOI: 10.1117/1.jbo.17.9.096004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Revised: 08/01/2012] [Accepted: 08/03/2012] [Indexed: 05/21/2023]
Abstract
A graphics processing unit-based parallel multigrid solver for a radiative transfer equation with vacuum boundary condition or reflection boundary condition is presented for heterogeneous media with complex geometry based on two-dimensional triangular meshes or three-dimensional tetrahedral meshes. The computational complexity of this parallel solver is linearly proportional to the degrees of freedom in both angular and spatial variables, while the full multigrid method is utilized to minimize the number of iterations. The overall gain of speed is roughly 30 to 300 fold with respect to our prior multigrid solver, which depends on the underlying regime and the parallelization. The numerical validations are presented with the MATLAB codes at https://sites.google.com/site/rtefastsolver/.
Collapse
Affiliation(s)
- Hao Gao
- Emory University, Department of Mathematics and Computer Science, Atlanta, Georgia 30322, USA.
| | | | | |
Collapse
|
33
|
Ford TN, Lim D, Mertz J. Fast optically sectioned fluorescence HiLo endomicroscopy. J Biomed Opt 2012; 17:021105. [PMID: 22463023 PMCID: PMC3382350 DOI: 10.1117/1.jbo.17.2.021105] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2011] [Revised: 11/04/2011] [Accepted: 11/07/2011] [Indexed: 05/19/2023]
Abstract
We describe a nonscanning, fiber bundle endomicroscope that performs optically sectioned fluorescence imaging with fast frame rates and real-time processing. Our sectioning technique is based on HiLo imaging, wherein two widefield images are acquired under uniform and structured illumination and numerically processed to reject out-of-focus background. This work is an improvement upon an earlier demonstration of widefield optical sectioning through a flexible fiber bundle. The improved device features lateral and axial resolutions of 2.6 and 17 μm, respectively, a net frame rate of 9.5 Hz obtained by real-time image processing with a graphics processing unit (GPU) and significantly reduced motion artifacts obtained by the use of a double-shutter camera. We demonstrate the performance of our system with optically sectioned images and videos of a fluorescently labeled chorioallantoic membrane (CAM) in the developing G. gallus embryo. HiLo endomicroscopy is a candidate technique for low-cost, high-speed clinical optical biopsies.
Collapse
Affiliation(s)
- Tim N Ford
- Boston University, Department of Biomedical Engineering, Boston, Massachusetts 02215, USA.
| | | | | |
Collapse
|
34
|
Fu Z, Jeong WK, Pan Y, Kirby RM, Whitaker RT. A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES. SIAM J Sci Comput 2011; 33:2468-2488. [PMID: 22641200 PMCID: PMC3360588 DOI: 10.1137/100788951] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton-Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512-2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers.
Collapse
Affiliation(s)
- Zhisong Fu
- The Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112
| | - Won-Ki Jeong
- Electrical and Computer Engineering, UNIST (Ulsan National Institute of Science and Technology), 100 Banyeon-ri Eonyang-eup, Ulju-gun Ulsan, Korea 689-798
| | - Yongsheng Pan
- The Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112
| | - Robert M. Kirby
- The Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112
| | - Ross T. Whitaker
- The Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112
| |
Collapse
|
35
|
Joldes GR, Wittek A, Miller K. Real-Time Nonlinear Finite Element Computations on GPU - Application to Neurosurgical Simulation. Comput Methods Appl Mech Eng 2010; 199:3305-3314. [PMID: 21179562 PMCID: PMC3003932 DOI: 10.1016/j.cma.2010.06.037] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Application of biomechanical modeling techniques in the area of medical image analysis and surgical simulation implies two conflicting requirements: accurate results and high solution speeds. Accurate results can be obtained only by using appropriate models and solution algorithms. In our previous papers we have presented algorithms and solution methods for performing accurate nonlinear finite element analysis of brain shift (which includes mixed mesh, different non-linear material models, finite deformations and brain-skull contacts) in less than a minute on a personal computer for models having up to 50.000 degrees of freedom. In this paper we present an implementation of our algorithms on a Graphics Processing Unit (GPU) using the new NVIDIA Compute Unified Device Architecture (CUDA) which leads to more than 20 times increase in the computation speed. This makes possible the use of meshes with more elements, which better represent the geometry, are easier to generate, and provide more accurate results.
Collapse
|