1
|
Łach Ł, Svyetlichnyy D. 3D Model of Carbon Diffusion during Diffusional Phase Transformations. Materials (Basel) 2024; 17:674. [PMID: 38591517 PMCID: PMC10856523 DOI: 10.3390/ma17030674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/21/2024] [Accepted: 01/25/2024] [Indexed: 04/10/2024]
Abstract
The microstructure plays a crucial role in determining the properties of metallic materials, in terms of both their strength and functionality in various conditions. In the context of the formation of microstructure, phase transformations that occur in materials are highly significant. These are processes during which the structure of a material undergoes changes, most commonly as a result of variations in temperature, pressure, or chemical composition. The study of phase transformations is a broad and rapidly evolving research area that encompasses both experimental investigations and modeling studies. A foundational understanding of carbon diffusion and phase transformations in materials science is essential for comprehending the behavior of materials under different conditions. This understanding forms the basis for the development and optimization of materials with desired properties. The aim of this paper is to create a three-dimensional model for carbon diffusion in the context of modeling diffusional phase transformations occurring in carbon steels. The proposed model relies on the utilization of the LBM (Lattice Boltzmann Method) and CUDA architecture. The resultant carbon diffusion model is intricately linked with a microstructure evolution model grounded in FCA (Frontal Cellular Automata). This manuscript provides a concise overview of the LBM and the FCA method. It outlines the structure of the developed three-dimensional model for carbon diffusion, details its correlation with the microstructure evolution model, and presents the developed algorithm for simulating carbon diffusion. Demonstrative examples of simulation results, illustrating the growth of the emerging phase and affected by various model parameters within particular planes of the 3D calculation domain, are also presented.
Collapse
Affiliation(s)
- Łukasz Łach
- AGH University of Krakow, Faculty of Metals Engineering and Industrial Computer Science, al. Mickiewicza 30, 30-059 Krakow, Poland;
| | | |
Collapse
|
2
|
Shafique M, Qazi SA, Omer H. Compressed SVD-based L + S model to reconstruct undersampled dynamic MRI data using parallel architecture. MAGMA 2023:10.1007/s10334-023-01128-5. [PMID: 37978992 DOI: 10.1007/s10334-023-01128-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/27/2023] [Accepted: 10/20/2023] [Indexed: 11/19/2023]
Abstract
BACKGROUND Magnetic Resonance Imaging (MRI) is a highly demanded medical imaging system due to high resolution, large volumetric coverage, and ability to capture the dynamic and functional information of body organs e.g. cardiac MRI is employed to assess cardiac structure and evaluate blood flow dynamics through the cardiac valves. Long scan time is the main drawback of MRI, which makes it difficult for the patients to remain still during the scanning process. OBJECTIVE By collecting fewer measurements, MRI scan time can be shortened, but this undersampling causes aliasing artifacts in the reconstructed images. Advanced image reconstruction algorithms have been used in literature to overcome these undersampling artifacts. These algorithms are computationally expensive and require a long time for reconstruction which makes them infeasible for real-time clinical applications e.g. cardiac MRI. However, exploiting the inherent parallelism in these algorithms can help to reduce their computation time. METHODS Low-rank plus sparse (L+S) matrix decomposition model is a technique used in literature to reconstruct the highly undersampled dynamic MRI (dMRI) data at the expense of long reconstruction time. In this paper, Compressed Singular Value Decomposition (cSVD) model is used in L+S decomposition model (instead of conventional SVD) to reduce the reconstruction time. The results provide improved quality of the reconstructed images. Furthermore, it has been observed that cSVD and other parts of the L+S model possess highly parallel operations; therefore, a customized GPU based parallel architecture of the modified L+S model has been presented to further reduce the reconstruction time. RESULTS Four cardiac MRI datasets (three different cardiac perfusion acquired from different patients and one cardiac cine data), each with different acceleration factors of 2, 6 and 8 are used for experiments in this paper. Experimental results demonstrate that using the proposed parallel architecture for the reconstruction of cardiac perfusion data provides a speed-up factor up to 19.15× (with memory latency) and 70.55× (without memory latency) in comparison to the conventional CPU reconstruction with no compromise on image quality. CONCLUSION The proposed method is well-suited for real-time clinical applications, offering a substantial reduction in reconstruction time.
Collapse
Affiliation(s)
- Muhammad Shafique
- Medical Image Processing Research Group (MIPRG), Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad, Pakistan.
- Department of Electrical Engineering, University of Poonch Rawalakot, Rawalakot, AJ&K, Pakistan.
| | - Sohaib Ayaz Qazi
- Cardiovascular Sciences, Department of Health, Medicine and Caring Sciences, Linköping University, Linköping, Sweden
- Center for Medical Image Science and Visualization (CMIV), Linköping University, Linköping, Sweden
| | - Hammad Omer
- Medical Image Processing Research Group (MIPRG), Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad, Pakistan
| |
Collapse
|
3
|
Lu Z, Guo L, Chen J, Wang R. Reference-based genome compression using the longest matched substrings with parallelization consideration. BMC Bioinformatics 2023; 24:369. [PMID: 37777730 PMCID: PMC10544193 DOI: 10.1186/s12859-023-05500-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 09/26/2023] [Indexed: 10/02/2023] Open
Abstract
BACKGROUND A large number of researchers have devoted to accelerating the speed of genome sequencing and reducing the cost of genome sequencing for decades, and they have made great strides in both areas, making it easier for researchers to study and analyze genome data. However, how to efficiently store and transmit the vast amount of genome data generated by high-throughput sequencing technologies has become a challenge for data compression researchers. Therefore, the research of genome data compression algorithms to facilitate the efficient representation of genome data has gradually attracted the attention of these researchers. Meanwhile, considering that the current computing devices have multiple cores, how to make full use of the advantages of the computing devices and improve the efficiency of parallel processing is also an important direction for designing genome compression algorithms. RESULTS We proposed an algorithm (LMSRGC) based on reference genome sequences, which uses the suffix array (SA) and the longest common prefix (LCP) array to find the longest matched substrings (LMS) for the compression of genome data in FASTA format. The proposed algorithm utilizes the characteristics of SA and the LCP array to select all appropriate LMSs between the genome sequence to be compressed and the reference genome sequence and then utilizes LMSs to compress the target genome sequence. To speed up the operation of the algorithm, we use GPUs to parallelize the construction of SA, while using multiple threads to parallelize the creation of the LCP array and the filtering of LMSs. CONCLUSIONS Experiment results demonstrate that our algorithm is competitive with the current state-of-the-art algorithms in compression ratio and compression time.
Collapse
Affiliation(s)
- Zhiwen Lu
- School of Information, Yunnan University, KunMing, China
| | - Lu Guo
- Yunnan Physical Science and Sports Professional College, KunMing, China
| | - Jianhua Chen
- School of Information, Yunnan University, KunMing, China.
| | - Rongshu Wang
- School of Information, Yunnan University, KunMing, China
| |
Collapse
|
4
|
Łach Ł, Svyetlichnyy D. 3D Model of Heat Flow during Diffusional Phase Transformations. Materials (Basel) 2023; 16:4865. [PMID: 37445179 DOI: 10.3390/ma16134865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 06/30/2023] [Accepted: 07/04/2023] [Indexed: 07/15/2023]
Abstract
The structure of metallic materials has a significant impact on their properties. One of the most popular methods to form the properties of metal alloys is heat treatment, which uses thermally activated transformations that take place in metals to achieve the required mechanical or physicochemical properties. The phase transformation in steel results from the fact that one state becomes less durable than the other due to a change in conditions, for example, temperature. Phase transformations are an extensive field of research that is developing very dynamically both in the sphere of experimental and model research. The objective of this paper is the development of a 3D heat flow model to model heat transfer during diffusional phase transformations in carbon steels. This model considers the two main factors that influence the transformation: the temperature and the enthalpy of transformation. The proposed model is based on the lattice Boltzmann method (LBM) and uses CUDA parallel computations. The developed heat flow model is directly related to the microstructure evolution model, which is based on frontal cellular automata (FCA). This paper briefly presents information on the FCA, LBM, CUDA, and diffusional phase transformation in carbon steels. The structures of the 3D model of heat flow and their connection with the microstructure evolution model as well as the algorithm for simulation of heat transfer with consideration of the enthalpy of transformation are shown. Examples of simulation results of the growth of the new phase that are determined by the overheating/overcooling and different model parameters in the selected planes of the 3D calculation domain are also presented.
Collapse
Affiliation(s)
- Łukasz Łach
- AGH University of Krakow, Faculty of Metals Engineering and Industrial Computer Science, al. Mickiewicza 30, 30-059 Krakow, Poland
| | - Dmytro Svyetlichnyy
- AGH University of Krakow, Faculty of Metals Engineering and Industrial Computer Science, al. Mickiewicza 30, 30-059 Krakow, Poland
| |
Collapse
|
5
|
Nourse WRP, Jackson C, Szczecinski NS, Quinn RD. SNS-Toolbox: An Open Source Tool for Designing Synthetic Nervous Systems and Interfacing Them with Cyber-Physical Systems. Biomimetics (Basel) 2023; 8:247. [PMID: 37366842 DOI: 10.3390/biomimetics8020247] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/02/2023] [Accepted: 06/09/2023] [Indexed: 06/28/2023] Open
Abstract
One developing approach for robotic control is the use of networks of dynamic neurons connected with conductance-based synapses, also known as Synthetic Nervous Systems (SNS). These networks are often developed using cyclic topologies and heterogeneous mixtures of spiking and non-spiking neurons, which is a difficult proposition for existing neural simulation software. Most solutions apply to either one of two extremes, the detailed multi-compartment neural models in small networks, and the large-scale networks of greatly simplified neural models. In this work, we present our open-source Python package SNS-Toolbox, which is capable of simulating hundreds to thousands of spiking and non-spiking neurons in real-time or faster on consumer-grade computer hardware. We describe the neural and synaptic models supported by SNS-Toolbox, and provide performance on multiple software and hardware backends, including GPUs and embedded computing platforms. We also showcase two examples using the software, one for controlling a simulated limb with muscles in the physics simulator Mujoco, and another for a mobile robot using ROS. We hope that the availability of this software will reduce the barrier to entry when designing SNS networks, and will increase the prevalence of SNS networks in the field of robotic control.
Collapse
Affiliation(s)
- William R P Nourse
- Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Clayton Jackson
- Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Nicholas S Szczecinski
- Department of Mechanical and Aerospace Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Roger D Quinn
- Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
6
|
Li F, Zou F, Rao J. A multi-GPU and CUDA-aware MPI-based spectral element formulation for ultrasonic wave propagation in solid media. Ultrasonics 2023; 134:107049. [PMID: 37290255 DOI: 10.1016/j.ultras.2023.107049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 04/07/2023] [Accepted: 05/18/2023] [Indexed: 06/10/2023]
Abstract
In this paper, we introduce a new multi-GPU-based spectral element (SE) formulation for simulating ultrasonic wave propagation in solids. To maximize communication efficiency, we purposely developed, based on CUDA-aware MPI, two novel message exchange strategies which allow the common nodal forces of different subdomains to be shared between different GPUs in a direct manner, as opposed to via CPU hosts, during central difference-based time integration steps. The new multi-GPU and CUDA-aware MPI-based formulation is benchmarked against a multi-CPU core and classical MPI-based counterpart, demonstrating a remarkable acceleration in each and every stage of the computation of ultrasonic wave propagation, namely matrix assembly, time integration and message exchange. More importantly, both the computational efficiency and the degree-of-freedom limit of the new formulation are actually scalable with the number of GPUs used, potentially allowing larger structures to be computed and higher computational speeds to be realized. Finally, the new formulation was used to simulate the interaction between Lamb waves and randomly shaped thickness loss defects on plates, showing its potential to become an efficient, accurate and robust technique for addressing the propagation of ultrasonic waves in realistic engineering structures.
Collapse
Affiliation(s)
- Feilong Li
- Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region, China
| | - Fangxin Zou
- Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region, China.
| | - Jing Rao
- School of Instrumentation and Opto-Electronic Engineering, Beihang University, Beijing 100191, China; School of Engineering and Information Technology, The University of New South Wales, Canberra, ACT 2600, Australia
| |
Collapse
|
7
|
Fatigate GR, Lobosco M, Reis RF. A 3D Approach Using a Control Algorithm to Minimize the Effects on the Healthy Tissue in the Hyperthermia for Cancer Treatment. Entropy (Basel) 2023; 25:e25040684. [PMID: 37190473 PMCID: PMC10138007 DOI: 10.3390/e25040684] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 04/03/2023] [Accepted: 04/12/2023] [Indexed: 05/17/2023]
Abstract
According to the World Health Organization, cancer is a worldwide health problem. Its high mortality rate motivates scientists to study new treatments. One of these new treatments is hyperthermia using magnetic nanoparticles. This treatment consists in submitting the target region with a low-frequency magnetic field to increase its temperature over 43 °C, as the threshold for tissue damage and leading the cells to necrosis. This paper uses an in silico three-dimensional Pennes' model described by a set of partial differential equations (PDEs) to estimate the percentage of tissue damage due to hyperthermia. Differential evolution, an optimization method, suggests the best locations to inject the nanoparticles to maximize tumor cell death and minimize damage to healthy tissue. Three different scenarios were performed to evaluate the suggestions obtained by the optimization method. The results indicate the positive impact of the proposed technique: a reduction in the percentage of healthy tissue damage and the complete damage of the tumors were observed. In the best scenario, the optimization method was responsible for decreasing the healthy tissue damage by 59% when the nanoparticles injection sites were located in the non-intuitive points indicated by the optimization method. The numerical solution of the PDEs is computationally expensive. This work also describes the implemented parallel strategy based on CUDA to reduce the computational costs involved in the PDEs resolution. Compared to the sequential version executed on the CPU, the proposed parallel implementation was able to speed the execution time up to 84.4 times.
Collapse
Affiliation(s)
- Gustavo Resende Fatigate
- Pós-Graduação em Modelagem Computacional, Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n-São Pedro, Juiz de Fora 36036-900, MG, Brazil
| | - Marcelo Lobosco
- Pós-Graduação em Modelagem Computacional, Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n-São Pedro, Juiz de Fora 36036-900, MG, Brazil
- Departamento de Ciência da Computação, Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n-São Pedro, Juiz de Fora 36036-900, MG, Brazil
| | - Ruy Freitas Reis
- Pós-Graduação em Modelagem Computacional, Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n-São Pedro, Juiz de Fora 36036-900, MG, Brazil
- Departamento de Ciência da Computação, Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n-São Pedro, Juiz de Fora 36036-900, MG, Brazil
| |
Collapse
|
8
|
Brost EE, Wan Chan Tseung H, Antolak JA. A fast GPU-accelerated Monte Carlo engine for calculation of MLC-collimated electron fields. Med Phys 2023; 50:600-618. [PMID: 35986907 PMCID: PMC10087940 DOI: 10.1002/mp.15938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 08/05/2022] [Accepted: 08/09/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Although intensity-modulated radiation therapy and volumetric arc therapy have revolutionized photon external beam therapies, the technological advances associated with electron beam therapy have fallen behind. Modern linear accelerators contain technologies that would allow for more advanced forms of electron treatments, such as beam collimation, using the conventional photon multi-leaf collimator (MLC); however, no commercial solutions exist that calculate dose from such beam delivery modes. Additionally, for clinical adoption to occur, dose calculation times would need to be on par with that of modern dose calculation algorithms. PURPOSE This work developed a graphics processing unit (GPU)-accelerated Monte Carlo (MC) engine incorporating the Varian TrueBeam linac head geometry for a rapid calculation of electron beams collimated using the conventional photon MLC. METHODS A compute unified device architecture framework was created for the following: (1) transport of electrons and photons through the linac head geometry, considering multiple scattering, Bremsstrahlung, Møller, Compton, and pair production interactions; (2) electron and photon propagation through the CT geometry, considering all interactions plus the photoelectric effect; and (3) secondary particle cascades through the linac head and within the CT geometry. The linac head collimating geometry was modeled according to the specifications provided by the vendor, who also provided phase-space files. The MC was benchmarked against EGSnrc/DOSXYZnrc/GEANT by simulating individual interactions with simple geometries, pencil, and square beam dose calculations in various phantoms. MC-calculated dose distributions for MLC and jaw-collimated electron fields were compared to measurements in a water phantom and with radiochromic film. RESULTS Pencil and square beam dose distributions are in good agreement with DOSXYZnrc. Angular and spatial distributions for multiple scattering and secondary particle production in thin slab geometries are in good agreement with EGSnrc and GEANT. Dose profiles for MLC and jaw-collimated 6-20-MeV electron beams showed an average absolute difference of 1.1 and 1.9 mm for the FWHM and 80%-20% penumbra from measured profiles. Percent depth doses showed differences of <5% for as compared to measurement. The computation time on an NVIDIA Tesla V100 card was 2.5 min to achieve a dose uncertainty of <1%, which is ∼300 times faster than published results in a similar geometry using a single-CPU core. CONCLUSIONS The GPU-based MC can quickly calculate dose for electron fields collimated using the conventional photon MLC. The fast calculation times will allow for a rapid calculation of electron fields for mixed photon and electron particle therapy.
Collapse
Affiliation(s)
- Eric E Brost
- Department of Radiation Oncology, Mayo Clinic, Rochester, Minnesota, USA
| | - H Wan Chan Tseung
- Department of Radiation Oncology, Mayo Clinic, Rochester, Minnesota, USA
| | - John A Antolak
- Department of Radiation Oncology, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
9
|
Kumar A, Cuccuru G, Grüning B, Backofen R. An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy. Gigascience 2022; 12:giad028. [PMID: 37099385 PMCID: PMC10132306 DOI: 10.1093/gigascience/giad028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 01/23/2023] [Accepted: 04/11/2023] [Indexed: 04/27/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) programs that train on large datasets require powerful compute infrastructure consisting of several CPU cores and GPUs. JupyterLab provides an excellent framework for developing AI programs, but it needs to be hosted on such an infrastructure to enable faster training of AI programs using parallel computing. FINDINGS An open-source, docker-based, and GPU-enabled JupyterLab infrastructure is developed that runs on the public compute infrastructure of Galaxy Europe consisting of thousands of CPU cores, many GPUs, and several petabytes of storage to rapidly prototype and develop end-to-end AI projects. Using a JupyterLab notebook, long-running AI model training programs can also be executed remotely to create trained models, represented in open neural network exchange (ONNX) format, and other output datasets in Galaxy. Other features include Git integration for version control, the option of creating and executing pipelines of notebooks, and multiple dashboards and packages for monitoring compute resources and visualization, respectively. CONCLUSIONS These features make JupyterLab in Galaxy Europe highly suitable for creating and managing AI projects. A recent scientific publication that predicts infected regions in COVID-19 computed tomography scan images is reproduced using various features of JupyterLab on Galaxy Europe. In addition, ColabFold, a faster implementation of AlphaFold2, is accessed in JupyterLab to predict the 3-dimensional structure of protein sequences. JupyterLab is accessible in 2 ways-one as an interactive Galaxy tool and the other by running the underlying Docker container. In both ways, long-running training can be executed on Galaxy's compute infrastructure. Scripts to create the Docker container are available under MIT license at https://github.com/usegalaxy-eu/gpu-jupyterlab-docker.
Collapse
Affiliation(s)
- Anup Kumar
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Gianmauro Cuccuru
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| |
Collapse
|
10
|
Alevi D, Stimberg M, Sprekeler H, Obermayer K, Augustin M. Brian2 CUDA: Flexible and Efficient Simulation of Spiking Neural Network Models on GPUs. Front Neuroinform 2022; 16:883700. [PMID: 36387586 PMCID: PMC9660315 DOI: 10.3389/fninf.2022.883700] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 05/09/2022] [Indexed: 03/26/2024] Open
Abstract
Graphics processing units (GPUs) are widely available and have been used with great success to accelerate scientific computing in the last decade. These advances, however, are often not available to researchers interested in simulating spiking neural networks, but lacking the technical knowledge to write the necessary low-level code. Writing low-level code is not necessary when using the popular Brian simulator, which provides a framework to generate efficient CPU code from high-level model definitions in Python. Here, we present Brian2CUDA, an open-source software that extends the Brian simulator with a GPU backend. Our implementation generates efficient code for the numerical integration of neuronal states and for the propagation of synaptic events on GPUs, making use of their massively parallel arithmetic capabilities. We benchmark the performance improvements of our software for several model types and find that it can accelerate simulations by up to three orders of magnitude compared to Brian's CPU backend. Currently, Brian2CUDA is the only package that supports Brian's full feature set on GPUs, including arbitrary neuron and synapse models, plasticity rules, and heterogeneous delays. When comparing its performance with Brian2GeNN, another GPU-based backend for the Brian simulator with fewer features, we find that Brian2CUDA gives comparable speedups, while being typically slower for small and faster for large networks. By combining the flexibility of the Brian simulator with the simulation speed of GPUs, Brian2CUDA enables researchers to efficiently simulate spiking neural networks with minimal effort and thereby makes the advancements of GPU computing available to a larger audience of neuroscientists.
Collapse
Affiliation(s)
- Denis Alevi
- Technische Universität Berlin, Chair of Modelling of Cognitive Processes, Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
| | - Marcel Stimberg
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Henning Sprekeler
- Technische Universität Berlin, Chair of Modelling of Cognitive Processes, Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
| | - Klaus Obermayer
- Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
- Technische Universität Berlin, Chair of Neural Information Processing, Berlin, Germany
| | - Moritz Augustin
- Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
- Technische Universität Berlin, Chair of Neural Information Processing, Berlin, Germany
| |
Collapse
|
11
|
Ali NA, El Abbassi A, Bouattane O. Performance evaluation of spatial fuzzy C-means clustering algorithm on GPU for image segmentation. Multimed Tools Appl 2022; 82:6787-6805. [PMID: 35968411 PMCID: PMC9363269 DOI: 10.1007/s11042-022-13635-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/25/2022] [Accepted: 08/01/2022] [Indexed: 06/15/2023]
Abstract
Image processing by segmentation technique is an important phase in medical imaging such as MRI. Its objective is to analyze the different tissues in human body. In research area, Fuzzy set is one of the most successful techniques that guarantees a robust classification. Spatial FCM (SFCM); one of the fuzzy c-means variants; considers spatial information to deal with the noisy images. To reduce this iterative algorithm's execution time, a hard SIMD architecture has been planted named the Graphical Processing Unit (GPU). In this work, a great contribution has been done to diagnose, confront and implement three different parallel implementations on GPU. A parallel implementations' extensive study of SFCM entitled PSFCM using 3 × 3 window is presented, and the experiments illustrate a significant decrease in terms of running time of this algorithm known by its high complexity. The experimental results indicate that the parallel version's execution time is about 9.46 times faster than the sequential implementation on image segmentation. This gain in terms of speed-up is achieved on the Nvidia GeForce GT 740 m GPU.
Collapse
Affiliation(s)
- Noureddine Ait Ali
- Labo ERTTI, FST Errachidia, Moulay Ismail University of Meknes, Meknes, Morocco
| | - Ahmed El Abbassi
- Labo ERTTI, FST Errachidia, Moulay Ismail University of Meknes, Meknes, Morocco
| | - Omar Bouattane
- SSDIA Laboratory, ENSET-Mohammedia Hassan II University Casablanca, Casablanca, Morocco
| |
Collapse
|
12
|
Inam O, Qureshi M, Laraib Z, Akram H, Omer H. GPU accelerated Cartesian GRAPPA reconstruction using CUDA. J Magn Reson 2022; 337:107175. [PMID: 35259611 DOI: 10.1016/j.jmr.2022.107175] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 02/10/2022] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE GRAPPA (Generalized Auto-calibrating Partially Parallel Acquisition) is an advanced parallel MRI reconstruction method (pMRI) that enables under-sampled data acquisition with multiple receiver coils to reduce the MRI scan time and reconstructs artifact free image from the acquired under-sampled data. However, the reduction in MRI scan time comes at the expense of long reconstruction time. It is because the GRAPPA reconstruction time shows exponential growth with increasing number of receiver coils. Consequently, the conventional CPU platforms may not adhere to the requirements of fast data processing for MR image reconstruction. METHODS Graphics Processing Units (GPUs) have recently emerged as a viable commodity hardware to reduce the reconstruction time of pMRI methods. This paper presents a novel GPU based implementation of GRAPPA using custom built CUDA kernels, to meet the rising demands of fast MRI processing. The proposed framework exploits intrinsic parallelism in the calibration and synthesis phases of GRAPPA reconstruction process, aiming to achieve high speed MR image reconstruction for various GRAPPA configuration settings using different number of receiver coils, auto-calibration signals (ACS), sizes of GRAPPA kernel and acceleration factors. In-vivo experiments (using 8, 12 and 30 receiver coils) are performed to compare the performance of the proposed GPU accelerated GRAPPA with the CPU based GRAPPA extensions and GPU counterpart. RESULTS The results indicate that the proposed method achieves up to ≈47.8× , ≈17× and ≈3.8× speed up gains over multicore CPU (single thread), multicore CPU (8 thread) and Gadgetron (GPU based GRAPPA) respectively, without compromising the reconstruction accuracy. CONCLUSIONS The proposed method reduces the GRAPPA reconstruction time by employing the calibration phase (GRAPPA weights estimation) and synthesis phase (interpolation) on GPU. Our study shows that the proposed GPU based parallel framework for GRAPPA reconstruction provides a solution for high-speed image reconstruction while maintaining the quality of the reconstructed images.
Collapse
Affiliation(s)
- Omair Inam
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan.
| | - Mahmood Qureshi
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan.
| | - Zoia Laraib
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan
| | - Hamza Akram
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan
| | - Hammad Omer
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan.
| |
Collapse
|
13
|
Solis-Vasquez L, Tillack AF, Santos-Martins D, Koch A, LeGrand S, Forli S. Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking. Parallel Comput 2022; 109:102861. [PMID: 34898769 PMCID: PMC8654209 DOI: 10.1016/j.parco.2021.102861] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Irregular applications can be found in different scientific fields. In computer-aided drug design, molecular docking simulations play an important role in finding promising drug candidates. AutoDock is a software application widely used for predicting molecular interactions at close distances. It is characterized by irregular computations and long execution runtimes. In recent years, a hardware-accelerated version of AutoDock, called AutoDock-GPU, has been under active development. This work benchmarks the recent code and algorithmic enhancements incorporated into AutoDock-GPU. Particularly, we analyze the impact on execution runtime of techniques based on early termination. These enable AutoDock-GPU to explore the molecular space as necessary, while safely avoiding redundant computations. Our results indicate that it is possible to achieve average runtime reductions of 50% by using these techniques. Furthermore, a comprehensive literature review is also provided, where our work is compared to relevant approaches leveraging hardware acceleration for molecular docking.
Collapse
Affiliation(s)
- Leonardo Solis-Vasquez
- Embedded Systems and Applications Group. Technical University of Darmstadt, Darmstadt, Germany
- Hochschulstr. 10, D-64289, Darmstadt, Germany
| | - Andreas F. Tillack
- Department of Integrative Structural and Computational Biology. The Scripps Research Institute, La Jolla, CA, United States
| | - Diogo Santos-Martins
- Department of Integrative Structural and Computational Biology. The Scripps Research Institute, La Jolla, CA, United States
| | - Andreas Koch
- Embedded Systems and Applications Group. Technical University of Darmstadt, Darmstadt, Germany
| | | | - Stefano Forli
- Department of Integrative Structural and Computational Biology. The Scripps Research Institute, La Jolla, CA, United States
| |
Collapse
|
14
|
Pathuri SK, Anbazhagan N, Joshi GP, You J. Feature-Based Sentimental Analysis on Public Attention towards COVID-19 Using CUDA-SADBM Classification Model. Sensors (Basel) 2021; 22:80. [PMID: 35009619 PMCID: PMC8747430 DOI: 10.3390/s22010080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 12/11/2021] [Accepted: 12/15/2021] [Indexed: 11/16/2022]
Abstract
The COVID-19 pandemic has spread to almost all countries of the World and affected people both mentally and economically. The primary motivation of this research is to construct a model that takes reviews or evaluations from several people who are affected with COVID-19. As the number of cases has accelerated day by day, people are becoming panicked and concerned about their health. A good model may be helpful to provide accurate statistics in interpreting the actual records about the pandemic. In the proposed work, for sentimental analysis, a unique classifier named the Sentimental DataBase Miner algorithm (SADBM) is used to categorize the opinions and parallel processing, and is applied on the data collected from various online social media websites like Twitter, Facebook, and Linkedin. The accuracy of the proposed model is validated with trained data and compared with basic classifiers, such as logistic regression and decision tree. The proposed algorithm is executed on CPU as well as GPU and calculated the acceleration ratio of the model. The results show that the proposed model provides the best accuracy compared with the other two models, i.e., 96% (GPU).
Collapse
Affiliation(s)
- Siva Kumar Pathuri
- Department of CSE, KLEF, Vaddeswaram, Guntur District, Guntur 522502, Andhra Pradesh, India;
| | - N. Anbazhagan
- Department of Mathematics, Alagappa University, Karaikudi 630003, Tamil Nadu, India;
| | | | | |
Collapse
|
15
|
Kartsev A, Malkovsky S, Chibisov A. Analysis of Ionicity-Magnetism Competition in 2D-MX3 Halides towards a Low-Dimensional Materials Study Based on GPU-Enabled Computational Systems. Nanomaterials (Basel) 2021; 11:2967. [PMID: 34835730 DOI: 10.3390/nano11112967] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 10/20/2021] [Accepted: 10/22/2021] [Indexed: 11/17/2022]
Abstract
The acceleration of parallel high-throughput first-principle calculations in the context of 3D (three dimensional) periodic boundary conditions for low-dimensional systems, and particularly 2D materials, is an important issue for new material design. Where the scalability rapidly deflated due to the use of large void unit cells along with a significant number of atoms, which should mimic layered structures in the vacuum space. In this report, we explored the scalability and performance of the Quantum ESPRESSO package in the hybrid central processing unit - graphics processing unit (CPU-GPU) environment. The study carried out in the comparison to CPU-based systems for simulations of 2D magnets where significant improvement of computational speed was achieved based on the IBM ESSL SMP CUDA library. As an example of physics-related results, we have computed and discussed the ionicity-covalency and related ferro- (FM) and antiferro-magnetic (AFM) exchange competitions computed for some CrX3 compounds. Further, it has been demonstrated how this exchange interplay leads to high-order effects for the magnetism of the 1L-RuCl3 compound.
Collapse
|
16
|
Romano D, Lapegna M. A GPU-Parallel Image Coregistration Algorithm for InSar Processing at the Edge. Sensors (Basel) 2021; 21:s21175916. [PMID: 34502805 PMCID: PMC8434671 DOI: 10.3390/s21175916] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 07/28/2021] [Accepted: 08/25/2021] [Indexed: 11/16/2022]
Abstract
Image Coregistration for InSAR processing is a time-consuming procedure that is usually processed in batch mode. With the availability of low-energy GPU accelerators, processing at the edge is now a promising perspective. Starting from the individuation of the most computationally intensive kernels from existing algorithms, we decomposed the cross-correlation problem from a multilevel point of view, intending to design and implement an efficient GPU-parallel algorithm for multiple settings, including the edge computing one. We analyzed the accuracy and performance of the proposed algorithm—also considering power efficiency—and its applicability to the identified settings. Results show that a significant speedup of InSAR processing is possible by exploiting GPU computing in different scenarios with no loss of accuracy, also enabling onboard processing using SoC hardware.
Collapse
Affiliation(s)
- Diego Romano
- Institute for High Performance Computing and Networking (ICAR), CNR, 80131 Naples, Italy
- Correspondence: ; Tel.: +39-0816139518
| | - Marco Lapegna
- Department of Mathematics and Applications, University of Naples Federico II, 80126 Naples, Italy;
| |
Collapse
|
17
|
Artiles O, Saeed F. TurboBC: A Memory Efficient and Scalable GPU Based Betweenness Centrality Algorithm in the Language of Linear Algebra. Proc Int Workshops Parallel Proc 2021; 2021:10. [PMID: 35440894 PMCID: PMC9015014 DOI: 10.1145/3458744.3474047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Betweenness centrality (BC) is a shortest path centrality metric used to measure the influence of individual vertices or edges on huge graphs that are used for modeling and analysis of human brain, omics data, or social networks. The application of the BC algorithm to modern graphs must deal with the size of the graphs, as well with highly irregular data-access patterns. These challenges are particularly important when the BC algorithm is implemented on Graphics Processing Units (GPU), due to the limited global memory of these processors, as well as the decrease in performance due to the load unbalance resulting from processing irregular data structures. In this paper, we present the first GPU based linear-algebraic formulation and implementation of BC, called TurboBC, a set of memory efficient BC algorithms that exhibits good performance and high scalability on unweighted, undirected or directed sparse graphs of arbitrary structure. Our experiments demonstrate that our TurboBC algorithms obtain more than 18 GTEPs and an average speedup of 31.9x over the sequential version of the BC algorithm, and are on average 1.7x and 2.2x faster than the state-of-the-art algorithms implemented on the high performance, GPU-based, gunrock, and CPU-based, ligra libraries, respectively. These experiments also show that by minimizing their memory footprint, the TurboBC algorithms are able to compute the BC of relatively big graphs, for which the gunrock algorithms ran out of memory.
Collapse
Affiliation(s)
- Oswaldo Artiles
- School of Computing and Information Sciences, Florida, International University, Miami, Florida, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida, International University, Miami, Florida, USA
| |
Collapse
|
18
|
Dong Z, Gray H, Leggett C, Lin M, Pascuzzi VR, Yu K. Porting HEP Parameterized Calorimeter Simulation Code to GPUs. Front Big Data 2021; 4:665783. [PMID: 34250467 PMCID: PMC8267914 DOI: 10.3389/fdata.2021.665783] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 05/07/2021] [Indexed: 11/17/2022] Open
Abstract
The High Energy Physics (HEP) experiments, such as those at the Large Hadron Collider (LHC), traditionally consume large amounts of CPU cycles for detector simulations and data analysis, but rarely use compute accelerators such as GPUs. As the LHC is upgraded to allow for higher luminosity, resulting in much higher data rates, purely relying on CPUs may not provide enough computing power to support the simulation and data analysis needs. As a proof of concept, we investigate the feasibility of porting a HEP parameterized calorimeter simulation code to GPUs. We have chosen to use FastCaloSim, the ATLAS fast parametrized calorimeter simulation. While FastCaloSim is sufficiently fast such that it does not impose a bottleneck in detector simulations overall, significant speed-ups in the processing of large samples can be achieved from GPU parallelization at both the particle (intra-event) and event levels; this is especially beneficial in conditions expected at the high-luminosity LHC, where extremely high per-event particle multiplicities will result from the many simultaneous proton-proton collisions. We report our experience with porting FastCaloSim to NVIDIA GPUs using CUDA. A preliminary Kokkos implementation of FastCaloSim for portability to other parallel architectures is also described.
Collapse
Affiliation(s)
- Zhihua Dong
- Brookhaven National Laboratory, Upton, NY, United States
| | - Heather Gray
- Lawrence Berkeley National Laboratory, Berkeley, CA, United States.,University of California, Berkeley, CA, United States
| | - Charles Leggett
- Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Meifeng Lin
- Brookhaven National Laboratory, Upton, NY, United States
| | | | - Kwangmin Yu
- Brookhaven National Laboratory, Upton, NY, United States
| |
Collapse
|
19
|
Artiles O, Saeed F. TurboBFS: GPU Based Breadth-First Search (BFS) Algorithms in the Language of Linear Algebra. IEEE Int Symp Parallel Distrib Process Workshops Phd Forum 2021; 2021:520-528. [PMID: 35425667 PMCID: PMC9007172 DOI: 10.1109/ipdpsw52791.2021.00084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Graphs that are used for modeling of human brain, omics data, or social networks are huge, and manual inspection of these graph is impossible. A popular, and fundamental, method used for making sense of these large graphs is the well-known Breadth-First Search (BFS) algorithm. However, BFS suffers from large computational cost especially for big graphs of interest. More recently, the use of Graphics processing units (GPU) has been promising, but challenging because of limited global memory of GPU's, and irregular structures of real-world graphs. In this paper, we present a GPU based linear-algebraic formulation and implementation of BFS, called TurboBFS, that exhibits excellent scalability on unweighted, undirected or directed sparse graphs of arbitrary structure. We demonstrate that our algorithms obtain up to 40 GTEPs, and are on average 15.7x, 5.8x, and 1.8x faster than the other state-of-the-art algorithms implemented on the SuiteSparse:GraphBLAS, GraphBLAST, and gunrock libraries respectively. The codes to implement the algorithms proposed in this paper are available at https://github.com/pcdslab.
Collapse
Affiliation(s)
- Oswaldo Artiles
- School of Computing and Information Sciences, Florida International University, Miami, Florida
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, Florida
| |
Collapse
|
20
|
Goodin DA, Frieboes HB. Simulation of 3D centimeter-scale continuum tumor growth at sub-millimeter resolution via distributed computing. Comput Biol Med 2021; 134:104507. [PMID: 34157612 DOI: 10.1016/j.compbiomed.2021.104507] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 05/15/2021] [Accepted: 05/16/2021] [Indexed: 12/28/2022]
Abstract
Simulation of cm-scale tumor growth has generally been constrained by the computational cost to numerically solve the associated equations, with models limited to representing mm-scale or smaller tumors. While the work has proven useful to the study of small tumors and micro-metastases, a biologically-relevant simulation of cm-scale masses as would be typically detected and treated in patients has remained an elusive goal. This study presents a distributed computing (parallelized) implementation of a mixture model of tumor growth to simulate 3D cm-scale vascularized tissue at sub-mm resolution. The numerical solving scheme utilizes a two-stage parallelization framework. The solution is written for GPU computation using the CUDA framework, which handles all Multigrid-related computations. Message Passing Interface (MPI) handles distribution of information across multiple processes, freeing the program from RAM and the processing limitations found on single systems. On each system, Nvidia's CUDA library allows for fast processing of model data using GPU-bound computing on fewer systems. The results show that a combined MPI-CUDA implementation enables the continuum modeling of cm-scale tumors at reasonable computational cost. Further work to calibrate model parameters to particular tumor conditions could enable simulation of patient-specific tumors for clinical application.
Collapse
Affiliation(s)
- Dylan A Goodin
- Department of Bioengineering, University of Louisville, KY, USA
| | - Hermann B Frieboes
- Department of Bioengineering, University of Louisville, KY, USA; James Graham Brown Cancer Center, University of Louisville, KY, USA; Center for Predictive Medicine, University of Louisville, KY, USA.
| |
Collapse
|
21
|
Khalil MA, Ashfaq A, Shahzad H, Qazi SA, Omer H. GPU based parallel framework for receiver coil sensitivity estimation in SENSE reconstruction. Magn Reson Imaging 2021; 80:58-70. [PMID: 33905834 DOI: 10.1016/j.mri.2021.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 04/18/2021] [Accepted: 04/21/2021] [Indexed: 11/28/2022]
Abstract
Magnetic Resonance Imaging (MRI) uses non-ionizing radiations and is safer as compared to CT and X-ray imaging. MRI is broadly used around the globe for medical diagnostics. One main limitation of MRI is its long data acquisition time. Parallel MRI (pMRI) was introduced in late 1990's to reduce the MRI data acquisition time. In pMRI, data is acquired by under-sampling the Phase Encoding (PE) steps which introduces aliasing artefacts in the MR images. SENSitivity Encoding (SENSE) is a pMRI based method that reconstructs fully sampled MR image from the acquired under-sampled data using the sensitivity information of receiver coils. In SENSE, precise estimation of the receiver coil sensitivity maps is vital to obtain good quality images. Eigen-value method (a recently proposed method in literature for the estimation of receiver coil sensitivity information) does not require a pre-scan image unlike other conventional methods of sensitivity estimation. However, Eigen-value method is computationally intensive and takes a significant amount of time to estimate the receiver coil sensitivity maps. This work proposes a parallel framework for Eigen-value method of receiver coil sensitivity estimation that exploits its inherent parallelism using Graphics Processing Units (GPUs). We evaluated the performance of the proposed algorithm on in-vivo and simulated MRI datasets (i.e. human head and simulated phantom datasets) with Peak Signal-to-Noise Ratio (PSNR) and Artefact Power (AP) as evaluation metrics. The results show that the proposed GPU implementation reduces the execution time of Eigen-value method of receiver coil sensitivity estimation (providing up to 30 times speed up in our experiments) without degrading the quality of the reconstructed image.
Collapse
Affiliation(s)
- Muhammad Adil Khalil
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan
| | - Afaq Ashfaq
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan
| | | | - Sohaib Ayaz Qazi
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan
| | - Hammad Omer
- Medical Image Processing Research Group (MIPRG), Department of Electrical & Computer Engineering, COMSATS University Islamabad, Pakistan
| |
Collapse
|
22
|
Niedzwiedzki J, Niewola A, Lipinski P, Swaczyna P, Bobinski A, Poryzala P, Podsedkowski L. Real-Time Parallel-Serial LiDAR-Based Localization Algorithm with Centimeter Accuracy for GPS-Denied Environments. Sensors (Basel) 2020; 20:s20247123. [PMID: 33322587 PMCID: PMC7764368 DOI: 10.3390/s20247123] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 12/02/2020] [Accepted: 12/06/2020] [Indexed: 11/24/2022]
Abstract
In this paper, we introduce a real-time parallel-serial algorithm for autonomous robot positioning for GPS-denied, dark environments, such as caves and mine galleries. To achieve a good complexity-accuracy trade-off, we fuse data from light detection and ranging (LiDAR) and an inertial measurement unit (IMU). The proposed algorithm’s main novelty is that, unlike in most algorithms, we apply an extended Kalman filter (EKF) to each LiDAR scan point and calculate the location relative to a triangular mesh. We also introduce three implementations of the algorithm: serial, parallel, and parallel-serial. The first implementation verifies the correctness of our innovative approach, but is too slow for real-time execution. The second approach implements a well-known parallel data fusion approach, but is still too slow for our application. The third and final implementation of the presented algorithm along with the state-of-the-art GPU data structures achieves real-time performance. According to our experimental findings, our algorithm outperforms the reference Gaussian mixture model (GMM) localization algorithm in terms of accuracy by a factor of two.
Collapse
Affiliation(s)
- Jakub Niedzwiedzki
- Institute of Machine Tools and Production Engineering, Lodz University of Technology, ul. Stefanowskiego 1/15, 90-924 Lodz, Poland; (J.N.); (A.N.); (P.S.); (A.B.); (L.P.)
| | - Adam Niewola
- Institute of Machine Tools and Production Engineering, Lodz University of Technology, ul. Stefanowskiego 1/15, 90-924 Lodz, Poland; (J.N.); (A.N.); (P.S.); (A.B.); (L.P.)
| | - Piotr Lipinski
- Institute of Information Technology, Lodz University of Technology, ul. Wolczanska 215, 90-924 Lodz, Poland
- Correspondence:
| | - Piotr Swaczyna
- Institute of Machine Tools and Production Engineering, Lodz University of Technology, ul. Stefanowskiego 1/15, 90-924 Lodz, Poland; (J.N.); (A.N.); (P.S.); (A.B.); (L.P.)
| | - Aleksander Bobinski
- Institute of Machine Tools and Production Engineering, Lodz University of Technology, ul. Stefanowskiego 1/15, 90-924 Lodz, Poland; (J.N.); (A.N.); (P.S.); (A.B.); (L.P.)
| | - Pawel Poryzala
- Institute of Electronics, Lodz University of Technology, ul. Wolczanska 211/215, 93-005 Lodz, Poland;
| | - Leszek Podsedkowski
- Institute of Machine Tools and Production Engineering, Lodz University of Technology, ul. Stefanowskiego 1/15, 90-924 Lodz, Poland; (J.N.); (A.N.); (P.S.); (A.B.); (L.P.)
| |
Collapse
|
23
|
Grabia S, Smyczynska U, Pagacz K, Fendler W. NormiRazor: tool applying GPU-accelerated computing for determination of internal references in microRNA transcription studies. BMC Bioinformatics 2020; 21:425. [PMID: 32993488 PMCID: PMC7523363 DOI: 10.1186/s12859-020-03743-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Accepted: 09/07/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Multi-gene expression assays are an attractive tool in revealing complex regulatory mechanisms in living organisms. Normalization is an indispensable step of data analysis in all those studies, since it removes unwanted, non-biological variability from data. In targeted qPCR assays it is typically performed with respect to prespecified reference genes, but the lack of robust strategy of their selection is reported in literature, especially in studies concerning circulating microRNAs (miRNA). Unfortunately, this problem impedes translation of scientific discoveries on miRNA biomarkers into widely available laboratory assays. Previous studies concluded that averaged expressions of multi-miRNA combinations are more stable references than single genes. However, due to the number of such combinations the computational load is considerable and may be hindering for objective reference selection in large datasets. Existing implementations of normalization algorithms (geNorm, NormFinder and BestKeeper) have poor performance and may require days to compute stability values for all potential reference as the evaluation is performed sequentially. RESULTS We designed NormiRazor - an integrative tool which implements those methods in a parallel manner on a graphics processing unit (GPU) using CUDA platform. We tested our approach on publicly available miRNA expression datasets. As a result, the times of executions on 8 datasets containing from 50 to 400 miRNAs (subsets of GSE68314) decreased 18.7 ±0.6 (mean ±SD), 104.7 ±4.2 and 76.5 ±2.2 times for geNorm, BestKeeper and NormFinder with respect to previous Python implementation. To allow for easy access to normalization pipeline for biomedical researchers we implemented NormiRazor as an online platform where a user could normalize their datasets based on the automatically selected references. It is available at norm.btm.umed.pl, together with instruction manual and exemplary datasets. CONCLUSIONS NormiRazor allows for an easy, informed choice of reference genes for qPCR transcriptomic studies. As such it can improve comparability and repeatability of experiments and in longer perspective help translate newly discovered biomarkers into readily available assays.
Collapse
Affiliation(s)
- Szymon Grabia
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 15 Mazowiecka St., Lodz, 92-215 Poland
- Institute of Applied Computer Science, Lodz University of Technology, 18/22 Stefanowskiego St., Lodz, 90-537 Poland
| | - Urszula Smyczynska
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 15 Mazowiecka St., Lodz, 92-215 Poland
| | - Konrad Pagacz
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 15 Mazowiecka St., Lodz, 92-215 Poland
- Postgraduate School of Molecular Medicine, Medical University of Warsaw, 61 Zwirki i Wigury St., Warsaw, 02-091 Poland
| | - Wojciech Fendler
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 15 Mazowiecka St., Lodz, 92-215 Poland
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, 450 Brookline Av., Boston, MA 02215 USA
| |
Collapse
|
24
|
Sellami H, Cazenille L, Fujii T, Hagiya M, Aubert-Kato N, Genot AJ. Accelerating the Finite-Element Method for Reaction-Diffusion Simulations on GPUs with CUDA. Micromachines (Basel) 2020; 11:E881. [PMID: 32971889 DOI: 10.3390/mi11090881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 08/31/2020] [Accepted: 09/03/2020] [Indexed: 12/21/2022]
Abstract
DNA nanotechnology offers a fine control over biochemistry by programming chemical reactions in DNA templates. Coupled to microfluidics, it has enabled DNA-based reaction-diffusion microsystems with advanced spatio-temporal dynamics such as traveling waves. The Finite Element Method (FEM) is a standard tool to simulate the physics of such systems where boundary conditions play a crucial role. However, a fine discretization in time and space is required for complex geometries (like sharp corners) and highly nonlinear chemistry. Graphical Processing Units (GPUs) are increasingly used to speed up scientific computing, but their application to accelerate simulations of reaction-diffusion in DNA nanotechnology has been little investigated. Here we study reaction-diffusion equations (a DNA-based predator-prey system) in a tortuous geometry (a maze), which was shown experimentally to generate subtle geometric effects. We solve the partial differential equations on a GPU, demonstrating a speedup of ∼100 over the same resolution on a 20 cores CPU.
Collapse
|
25
|
Qazi SA, Tariq F, Ullah I, Omer H. Parallel implementation of L + S signal recovery in dynamic MRI. MAGMA 2020; 34:297-307. [PMID: 32601881 DOI: 10.1007/s10334-020-00861-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 06/09/2020] [Accepted: 06/22/2020] [Indexed: 11/25/2022]
Abstract
Dynamic MRI is useful to diagnose different diseases, e.g. cardiac ailments, by monitoring the structure and function of the heart and blood flow through the valves. Faster data acquisition is highly desirable in dynamic MRI, but this may lead to aliasing artifacts due to under-sampling. Advanced image reconstruction algorithms are required to obtain aliasing-free MR images from the acquired under-sampled data. One major limitation of using the advanced reconstruction algorithms is their computationally expensive and time-consuming nature, which make them infeasible for clinical use, especially for applications like cardiac MRI. L + S decomposition model is an approach provided in literature which separates the sparse and low-rank information in dynamic MRI. However, L + S decomposition model is a computationally complex process demanding significant computation time. In this paper, a parallel framework is proposed to accelerate the image reconstruction process of L + S decomposition model using GPU. Experiments are performed on cardiac perfusion dataset ([Formula: see text]) and cardiac cine dataset ([Formula: see text]) using NVIDIA's GeForce GTX780 GPU and Core-i7 CPU. The results show that the proposed method provides up to 18 × speed-up including the memory transfer time (i.e. data transfer between the CPU and GPU) and ~ 46 × speed-up without memory transfer for the cardiac perfusion dataset in our experiments. This level of improvement in the reconstruction time will increase the usefulness of L + S reconstruction by making it feasible for clinical applications.
Collapse
Affiliation(s)
- Sohaib A Qazi
- Medical Image Processing Research Group (MIPRG), Department of Electrical and Computer Engineering, COMSATS University, Islamabad, Pakistan.
| | - Fareena Tariq
- Medical Image Processing Research Group (MIPRG), Department of Electrical and Computer Engineering, COMSATS University, Islamabad, Pakistan
| | - Irfan Ullah
- Medical Image Processing Research Group (MIPRG), Department of Electrical and Computer Engineering, COMSATS University, Islamabad, Pakistan
| | - Hammad Omer
- Medical Image Processing Research Group (MIPRG), Department of Electrical and Computer Engineering, COMSATS University, Islamabad, Pakistan
| |
Collapse
|
26
|
Hattori LT, Pinheiro BA, Frigori RB, Benítez CMV, Lopes HS. PathMolD-AB: Spatiotemporal pathways of protein folding using parallel molecular dynamics with a coarse-grained model. Comput Biol Chem 2020; 87:107301. [PMID: 32554177 DOI: 10.1016/j.compbiolchem.2020.107301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 05/25/2020] [Accepted: 05/28/2020] [Indexed: 10/24/2022]
Abstract
Solving the protein folding problem (PFP) is one of the grand challenges still open in computational biophysics. Globular proteins are believed to evolve from initial configurations through folding pathways connecting several thermodynamically accessible states in a free energy landscape until reaching its minimum, inhabited by the stable native structures. Despite its huge computational burden, molecular dynamics (MD) is the leading approach in the PFP studies by preserving the Newtonian temporal evolution in the canonical ensemble. Non-trivial improvements are provided by highly parallel implementations of MD in cost-effective GPUs, concomitant to multiscale descriptions of proteins by coarse-grained minimalist models. In this vein, we present the PathMolD-AB framework, a comprehensive software package for massively parallel MD simulations using the canonical ensemble, structural analysis, and visualization of the folding pathways using the minimalist AB-model. It has, also, a tool to compare the results with proteins re-scaled from the PDB. We simulate and analyze, as case studies, the folding of four proteins: 13FIBO, 2GB1, 1PLC and 5ANZ, with 13, 55, 99 and 223 amino acids, respectively. The datasets generated from simulations correspond to the MD evolution of 3500 folding pathways, encompassing 35×106 states, which contains the spatial amino acid positions, the protein free energies and radii of gyration at each time step. Results indicate that the speedup of our approach grows logarithmically with the protein length and, therefore, it is suited for most of the proteins in the PDB. The predicted structures simulated by PathMolD-AB were similar to the re-scaled biological structures, indicating that it is promising for the study of the PFP study.
Collapse
Affiliation(s)
- Leandro Takeshi Hattori
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil.
| | - Bruna Araujo Pinheiro
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil.
| | - Rafael Bertolini Frigori
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil.
| | - César Manuel Vargas Benítez
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil
| | - Heitor Silvério Lopes
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil.
| |
Collapse
|
27
|
Isupov K. Performance data of multiple-precision scalar and vector BLAS operations on CPU and GPU. Data Brief 2020; 30:105506. [PMID: 32373682 PMCID: PMC7195515 DOI: 10.1016/j.dib.2020.105506] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Revised: 03/14/2020] [Accepted: 03/23/2020] [Indexed: 12/02/2022] Open
Abstract
Many optimized linear algebra packages support the single- and double-precision floating-point data types. However, there are a number of important applications that require a higher level of precision, up to hundreds or even thousands of digits. This article presents performance data of four dense basic linear algebra subprograms – ASUM, DOT, SCAL, and AXPY – implemented using existing extended-/multiple-precision software for conventional central processing units and CUDA compatible graphics processing units. The following open source packages are considered: MPFR, MPDECIMAL, ARPREC, MPACK, XBLAS, GARPREC, CAMPARY, CUMP, and MPRES-BLAS. The execution time of CPU and GPU implementations is measured at a fixed problem size and various levels of numeric precision. The data in this article are related to the research article entitled “Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units” [1].
Collapse
|
28
|
Abstract
Probability density approximation (PDA) is a nonparametric method of calculating probability densities. When integrated into Bayesian estimation, it allows researchers to fit psychological processes for which analytic probability functions are unavailable, significantly expanding the scope of theories that can be quantitatively tested. PDA is, however, computationally intensive, requiring large numbers of Monte Carlo simulations in order to attain good precision. We introduce Parallel PDA (pPDA), a highly efficient implementation of this method utilizing the Armadillo C++ and CUDA C libraries to conduct millions of model simulations simultaneously in graphics processing units (GPUs). This approach provides a practical solution for rapidly approximating probability densities with high precision. In addition to demonstrating this method, we fit a piecewise linear ballistic accumulator model (Holmes, Trueblood, & Heathcote, 2016) to empirical data. Finally, we conducted simulation studies to investigate various issues associated with PDA and provide guidelines for pPDA applications to other complex cognitive models.
Collapse
|
29
|
Uzelac I, Iravanian S, Fenton FH. Parallel Acceleration on Removal of Optical Mapping Baseline Wandering. Comput Cardiol (2010) 2019; 46:10.22489/cinc.2019.433. [PMID: 35719209 PMCID: PMC9202644 DOI: 10.22489/cinc.2019.433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Optical mapping measurements on hearts stained with fluorescent dyes is imagining method widely accepted and recognized as a tool to study complex spatial-temporal dynamics of cardiac electro-physiology. One shortcoming of the method is baseline wandering in obtained fluorescence signals as signals relevant to transmembrane potential (Vm) change and free intracellular calcium concentration ([Ca]i +2), the two most used dyes, are calculated as a relative signal change in respect to the fluorescence baseline. These changes are small fractional changes often smaller than 10 %. Baseline fluorescence drifts due to dye photo-bleaching, heart contraction/movement artifacts, and stability of the excitation light source over time. Depending on experimental instrumentation, recording duration, signal to noise levels and study aims of the optical imagining, many research groups adopted their own techniques tailored to a specific experimental data. Here we present a technique based on finite impulse response (FIR) filters with paralleled acceleration implemented on GPUs and multi-core CPU, in MATLAB.
Collapse
Affiliation(s)
- Ilija Uzelac
- School of Physics, Georgia Intitute of Technology, Atlanta, GA, USA
| | | | - Flavio H Fenton
- School of Physics, Georgia Intitute of Technology, Atlanta, GA, USA
| |
Collapse
|
30
|
Na JC, Lee I, Rhee JK, Shin SY. Fast single individual haplotyping method using GPGPU. Comput Biol Med 2019; 113:103421. [PMID: 31499396 DOI: 10.1016/j.compbiomed.2019.103421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 08/28/2019] [Accepted: 08/28/2019] [Indexed: 11/27/2022]
Abstract
BACKGROUND Most bioinformatic tools for next generation sequencing (NGS) data are computationally intensive, requiring a large amount of computational power for processing and analysis. Here the utility of graphic processing units (GPUs) for NGS data computation is assessed. METHOD In a previous study, we developed a probabilistic evolutionary algorithm with toggling for haplotyping (PEATH) method based on the estimation of distribution algorithm and toggling heuristic. Here, we parallelized the PEATH method (PEATH/G) using general-purpose computing on GPU (GPGPU). RESULTS The PEATH/G runs approximately 46.8 times and 25.4 times faster than PEATH on the NA12878 fosmid-sequencing dataset and the HuRef dataset, respectively, with an NVIDIA GeForce GTX 1660Ti. Moreover, the PEATH/G is approximately 13.3 times faster on the fosmid-sequencing dataset, even with an inexpensive conventional GPGPU (NVIDIA GeForce GTX 950). CONCLUSIONS PEATH/G can be a practical single individual haplotyping tool in terms of both its accuracy and speed. GPGPU can help reduce the running time of NGS analysis tools.
Collapse
Affiliation(s)
- Joong Chae Na
- Department of Computer Science and Engineering, Sejong University, Seoul, 05006, South Korea
| | - Inbok Lee
- Department of Software, Korea Aerospace University, Goyang, 10540, South Korea
| | - Je-Keun Rhee
- School of Systems Biomedical Science, Soongsil University, Seoul, 06978, South Korea.
| | - Soo-Yong Shin
- Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06351, South Korea; Big Data Research Center, Samsung Medical Center, Seoul, 06351, South Korea.
| |
Collapse
|
31
|
Subbiah A, Ogunfunmi T. A Flexible Hybrid BCH Decoder for Modern NAND Flash Memories Using General Purpose Graphical Processing Units (GPGPUs). Micromachines (Basel) 2019; 10:mi10060365. [PMID: 31159191 PMCID: PMC6632097 DOI: 10.3390/mi10060365] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Revised: 05/16/2019] [Accepted: 05/23/2019] [Indexed: 11/16/2022]
Abstract
Bose-Chaudhuri-Hocquenghem (BCH) codes are broadly used to correct errors in flash memory systems and digital communications. These codes are cyclic block codes and have their arithmetic fixed over the splitting field of their generator polynomial. There are many solutions proposed using CPUs, hardware, and Graphical Processing Units (GPUs) for the BCH decoders. The performance of these BCH decoders is of ultimate importance for systems involving flash memory. However, it is essential to have a flexible solution to correct multiple bit errors over the different finite fields (GF(2 m )). In this paper, we propose a pragmatic approach to decode BCH codes over the different finite fields using hardware circuits and GPUs in tandem. We propose to employ hardware design for a modified syndrome generator and GPUs for a key-equation solver and an error corrector. Using the above partition, we have shown the ability to support multiple bit errors across different BCH block codes without compromising on the performance. Furthermore, the proposed method to generate modified syndrome has zero latency for scenarios where there are no errors. When there is an error detected, the GPUs are deployed to correct the errors using the iBM and Chien search algorithm. The results have shown that using the modified syndrome approach, we can support different multiple finite fields with high throughput.
Collapse
Affiliation(s)
- Arul Subbiah
- Department of Electrical Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, CA 95053, USA.
| | - Tokunbo Ogunfunmi
- Department of Electrical Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, CA 95053, USA.
| |
Collapse
|
32
|
Lu Y, Ramachandra ACV, Pham M, Tu YC, Cheng F. CuDDI: A CUDA-Based Application for Extracting Drug-Drug Interaction Related Substance Terms from PubMed Literature. Molecules 2019; 24:E1081. [PMID: 30893816 DOI: 10.3390/molecules24061081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 03/12/2019] [Accepted: 03/16/2019] [Indexed: 11/30/2022] Open
Abstract
Drug-drug interaction (DDI) is becoming a serious issue in clinical pharmacy as the use of multiple medications is more common. The PubMed database is one of the biggest literature resources for DDI studies. It contains over 150,000 journal articles related to DDI and is still expanding at a rapid pace. The extraction of DDI-related information, including compounds and proteins from PubMed, is an essential step for DDI research. In this paper, we introduce a tool, CuDDI (compute unified device architecture-based DDI searching), for identification of DDI-related terms (including compounds and proteins) from PubMed. There are three modules in this application, including the automatic retrieval of substances from PubMed, the identification of DDI-related terms, and the display of relationship of DDI-related terms. For DDI term identification, a speedup of 30–105 times was observed for the compute unified device architecture (CUDA)-based version compared with the implementation with a CPU-based Python version. CuDDI can be used to discover DDI-related terms and relationships of these terms, which has the potential to help clinicians and pharmacists better understand the mechanism of DDIs. CuDDI is available at: https://github.com/chengusf/CuDDI.
Collapse
|
33
|
Abstract
Bilateral filters have been extensively utilized in a number of image denoising applications such as segmentation, registration, and tissue classification. However, it requires burdensome adjustments of the filter parameters to achieve the best performance for each individual image. To address this problem, this paper proposes a computer-aided parameter decision system based on image texture features associated with neural networks. In our approach, parallel computing with the GPU architecture is first developed to accelerate the computation of the conventional bilateral filter. Subsequently, a back propagation network (BPN) scheme using significant image texture features as the input is established to estimate the GPU-based bilateral filter parameters and its denoising process. The k-fold cross validation method is exploited to evaluate the performance of the proposed automatic restoration framework. A wide variety of T1-weighted brain MR images were employed to train and evaluate this parameter-free decision system with GPU-based bilateral filtering, which resulted in a speed-up factor of 208 comparing to the CPU-based computation. The proposed filter parameter prediction system achieved a mean absolute percentage error (MAPE) of 6% and was classified as "high accuracy". Our automatic denoising framework dramatically removed noise in numerous brain MR images and outperformed several state-of-the-art methods based on the peak signal-to-noise ratio (PSNR). The usage of image texture features associated with the BPN to estimate the GPU-based bilateral filter parameters and to automate the denoising process is feasible and validated. It is suggested that this automatic restoration system is advantageous to various brain MR image-processing applications.
Collapse
Affiliation(s)
- Herng-Hua Chang
- Computational Biomedical Engineering Laboratory (CBEL), Department of Engineering Science and Ocean Engineering, National Taiwan University, 1 Sec. 4 Roosevelt Road, Daan, Taipei, 10617, Taiwan.
| | - Yu-Ju Lin
- Computational Biomedical Engineering Laboratory (CBEL), Department of Engineering Science and Ocean Engineering, National Taiwan University, 1 Sec. 4 Roosevelt Road, Daan, Taipei, 10617, Taiwan
| | - Audrey Haihong Zhuang
- Department of Radiation Oncology, Keck Medical School, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
34
|
Okada S, Murakami K, Incerti S, Amako K, Sasaki T. MPEXS-DNA, a new GPU-based Monte Carlo simulator for track structures and radiation chemistry at subcellular scale. Med Phys 2019; 46:1483-1500. [PMID: 30593679 PMCID: PMC6850505 DOI: 10.1002/mp.13370] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Revised: 12/17/2018] [Accepted: 12/19/2018] [Indexed: 11/23/2022] Open
Abstract
Purpose Track structure simulation codes can accurately reproduce the stochastic nature of particle–matter interactions in order to evaluate quantitatively radiation damage in biological cells such as DNA strand breaks and base damage. Such simulations handle large numbers of secondary charged particles and molecular species created in the irradiated medium. Every particle and molecular species are tracked step‐by‐step using a Monte Carlo method to calculate energy loss patterns and spatial distributions of molecular species inside a cell nucleus with high spatial accuracy. The Geant4‐DNA extension of the Geant4 general‐purpose Monte Carlo simulation toolkit allows for such track structure simulations and can be run on CPUs. However, long execution times have been observed for the simulation of DNA damage in cells. We present in this work an improvement of the computing performance of such simulations using ultraparallel processing on a graphical processing unit (GPU). Methods A new Monte Carlo simulator named MPEXS‐DNA, allowing high computing performance by using a GPU, has been developed for track structure and radiolysis simulations at the subcellular scale. MPEXS‐DNA physics and chemical processes are based on Geant4‐DNA processes available in Geant4 version 10.02 p03. We have reimplemented the Geant4‐DNA process codes of the physics stage (electromagnetic processes of charged particles) and the chemical stage (diffusion and chemical reactions for molecular species) for microdosimetry simulation by using the CUDA language. MPEXS‐DNA can calculate a distribution of energy loss in the irradiated medium caused by charged particles and also simulate production, diffusion, and chemical interactions of molecular species from water radiolysis to quantitatively assess initial damage to DNA. The validation of MPEXS‐DNA physics and chemical simulations was performed by comparing various types of distributions, namely the radial dose distributions for the physics stage, and the G‐value profiles for each chemical product and their linear energy transfer dependency for the chemical stage, to existing experimental data and simulation results obtained by other simulation codes, including PARTRAC. Results For physics validation, radial dose distributions calculated by MPEXS‐DNA are consistent with experimental data and numerical simulations. For chemistry validation, MPEXS‐DNA can also reproduce G‐value profiles for each molecular species with the same tendency as existing experimental data. MPEXS‐DNA also agrees with simulations by PARTRAC reasonably well. However, we have confirmed that there are slight discrepancies in G‐value profiles calculated by MPEXS‐DNA for molecular species such as H2 and H2O2 when compared to experimental data and PARTRAC simulations. The differences in G‐value profiles between MPEXS‐DNA and PARTRAC are caused by the different chemical reactions considered. MPEXS‐DNA can drastically boost the computing performance of track structure and radiolysis simulations. By using NVIDIA's GPU devices adopting the Volta architecture, MPEXS‐DNA has achieved speedup factors up to 2900 against Geant4‐DNA simulations with a single CPU core. Conclusion The MPEXS‐DNA Monte Carlo simulation achieves similar accuracy to Monte Carlo simulations performed using other codes such as Geant4‐DNA and PARTRAC, and its predictions are consistent with experimental data. Notably, MPEXS‐DNA allows calculations that are, at maximum, 2900 times faster than conventional simulations using a CPU.
Collapse
Affiliation(s)
- Shogo Okada
- KEK, 1-1, Oho, Tsukuba, Ibaraki, 305-0801, Japan
| | | | - Sebastien Incerti
- University of Bordeaux, CENBG, UMR 5797, Gradignan, F-33170, France.,CNRS, IN2P3, CENBG, UMR 5797, Gradignan, F-33170, France
| | | | | |
Collapse
|
35
|
Abstract
Background We present a performance per watt analysis of CUDAlign 4.0, a parallel strategy to obtain the optimal pairwise alignment of huge DNA sequences in multi-GPU platforms using the exact Smith-Waterman method. Results Our study includes acceleration factors, performance, scalability, power efficiency and energy costs. We also quantify the influence of the contents of the compared sequences, identify potential scenarios for energy savings on speculative executions, and calculate performance and energy usage differences among distinct GPU generations and models. For a sequence alignment on chromosome-wide scale (around 2 Petacells), we are able to reduce execution times from 9.5 h on a Kepler GPU to just 2.5 h on a Pascal counterpart, with energy costs cut by 60%. Conclusions We find GPUs to be an order of magnitude ahead in performance per watt compared to Xeon Phis. Finally, versus typical low-power devices like FPGAs, GPUs keep similar GFLOPS/w ratios in 2017 on a five times faster execution.
Collapse
|
36
|
Landau W, Niemi J, Nettleton D. Fully Bayesian analysis of RNA-seq counts for the detection of gene expression heterosis. J Am Stat Assoc 2018; 114:610-621. [PMID: 31354180 PMCID: PMC6660196 DOI: 10.1080/01621459.2018.1497496] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 01/01/2018] [Indexed: 01/17/2023]
Abstract
Heterosis, or hybrid vigor, is the enhancement of the phenotype of hybrid progeny relative to their inbred parents. Heterosis is extensively used in agriculture, and the underlying mechanisms are unclear. To investigate the molecular basis of phenotypic heterosis, researchers search tens of thousands of genes for heterosis with respect to expression in the transcriptome. Difficulty arises in the assessment of heterosis due to composite null hypotheses and non-uniform distributions for p-values under these null hypotheses. Thus, we develop a general hierarchical model for count data and a fully Bayesian analysis in which an efficient parallelized Markov chain Monte Carlo algorithm ameliorates the computational burden. We use our method to detect gene expression heterosis in a two-hybrid plant-breeding scenario, both in a real RNA-seq maize dataset and in simulation studies. In the simulation studies, we show our method has well-calibrated posterior probabilities and credible intervals when the model assumed in analysis matches the model used to simulate the data. Although model misspecification can adversely affect calibration, the methodology is still able to accurately rank genes. Finally, we show that hyperparameter posteriors are extremely narrow and an empirical Bayes (eBayes) approach based on posterior means from the fully Bayesian analysis provides virtually equivalent posterior probabilities, credible intervals, and gene rankings relative to the fully Bayesian solution. This evidence of equivalence provides support for the use of eBayes procedures in RNA-seq data analysis if accurate hyperparameter estimates can be obtained.
Collapse
Affiliation(s)
- Will Landau
- Department of Statistics, Iowa State University
| | - Jarad Niemi
- Department of Statistics, Iowa State University
| | | |
Collapse
|
37
|
Abbaszadeh O, Khanteymoori AR, Azarpeyvand A. Parallel Algorithms for Inferring Gene Regulatory Networks: A Review. Curr Genomics 2018; 19:603-614. [PMID: 30386172 PMCID: PMC6194435 DOI: 10.2174/1389202919666180601081718] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 02/20/2018] [Accepted: 05/22/2018] [Indexed: 11/22/2022] Open
Abstract
System biology problems such as whole-genome network construction from large-scale gene expression data are sophisticated and time-consuming. Therefore, using sequential algorithms are not feasible to obtain a solution in an acceptable amount of time. Today, by using massively parallel computing, it is possible to infer large-scale gene regulatory networks. Recently, establishing gene regulatory networks from large-scale datasets have drawn the noticeable attention of researchers in the field of parallel computing and system biology. In this paper, we attempt to provide a more detailed overview of the recent parallel algorithms for constructing gene regulatory networks. Firstly, fundamentals of gene regulatory networks inference and large-scale datasets challenges are given. Secondly, a detailed description of the four parallel frameworks and libraries including CUDA, OpenMP, MPI, and Hadoop is discussed. Thirdly, parallel algorithms are reviewed. Finally, some conclusions and guidelines for parallel reverse engineering are described.
Collapse
Affiliation(s)
- Omid Abbaszadeh
- Department of Electrical and Computer Engineering, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Electrical and Computer Engineering, University of Zanjan, Zanjan, Iran
| | - Ali Azarpeyvand
- Department of Electrical and Computer Engineering, University of Zanjan, Zanjan, Iran
| |
Collapse
|
38
|
Awan MG, Eslami T, Saeed F. GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data. Comput Biol Med 2018; 101:163-173. [PMID: 30145436 DOI: 10.1016/j.compbiomed.2018.08.015] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 08/10/2018] [Accepted: 08/12/2018] [Indexed: 11/29/2022]
Abstract
In the age of ever increasing data, faster and more efficient data processing algorithms are needed. Graphics Processing Units (GPU) are emerging as a cost-effective alternative architecture for high-end computing. The optimal design of GPU algorithms is a challenging task which requires thorough understanding of the high performance computing architecture as well as the algorithmic design. The steep learning curve needed for effective GPU-centric algorithm design and implementation requires considerable expertise, time, and resources. In this paper, we present GPU-DAEMON, a GPU Data Management, Algorithm Design and Optimization technique suitable for processing array based big omics data. Our proposed GPU algorithm design template outlines and provides generic methods to tackle critical bottlenecks which can be followed to implement high performance, scalable GPU algorithms for given big data problem. We study the capability of GPU-DAEMON by reviewing the implementation of GPU-DAEMON based algorithms for three different big data problems. Speed up of as large as 386x (over the sequential version) and 50x (over naive GPU design methods) are observed using the proposed GPU-DAEMON. GPU-DAEMON template is available at https://github.com/pcdslab/GPU-DAEMON and the source codes for GPU-ArraySort, G-MSR and GPU-PCC are available at https://github.com/pcdslab.
Collapse
Affiliation(s)
- Muaaz Gul Awan
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| | - Taban Eslami
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL, USA.
| |
Collapse
|
39
|
Matić T, Aleksi I, Hocenski Ž, Kraus D. Real-time biscuit tile image segmentation method based on edge detection. ISA Trans 2018; 76:246-254. [PMID: 29609803 DOI: 10.1016/j.isatra.2018.03.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 02/23/2018] [Accepted: 03/21/2018] [Indexed: 06/08/2023]
Abstract
In this paper we propose a novel real-time Biscuit Tile Segmentation (BTS) method for images from ceramic tile production line. BTS method is based on signal change detection and contour tracing with a main goal of separating tile pixels from background in images captured on the production line. Usually, human operators are visually inspecting and classifying produced ceramic tiles. Computer vision and image processing techniques can automate visual inspection process if they fulfill real-time requirements. Important step in this process is a real-time tile pixels segmentation. BTS method is implemented for parallel execution on a GPU device to satisfy the real-time constraints of tile production line. BTS method outperforms 2D threshold-based methods, 1D edge detection methods and contour-based methods. Proposed BTS method is in use in the biscuit tile production line.
Collapse
Affiliation(s)
- Tomislav Matić
- Josip Juraj Strossmayer University of Osijek, Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Kneza Trpimira 2b, Osijek, 31000, Croatia.
| | - Ivan Aleksi
- Josip Juraj Strossmayer University of Osijek, Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Kneza Trpimira 2b, Osijek, 31000, Croatia.
| | - Željko Hocenski
- Josip Juraj Strossmayer University of Osijek, Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Kneza Trpimira 2b, Osijek, 31000, Croatia.
| | - Dieter Kraus
- Hochschule Bremen, City University of Applied Sciences, Institute of Water-Acoustics, Sonar Engineering and Signal-Theory, Neustadtswall 30, D-28199, Bremen, Germany.
| |
Collapse
|
40
|
Eslami T, Saeed F. Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson's Correlation Coefficients for Time Series Data-fMRI Study. High Throughput 2018; 7:E11. [PMID: 29677161 PMCID: PMC6023306 DOI: 10.3390/ht7020011] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 04/04/2018] [Accepted: 04/17/2018] [Indexed: 11/16/2022] Open
Abstract
Functional magnetic resonance imaging (fMRI) is a non-invasive brain imaging technique, which has been regularly used for studying brain’s functional activities in the past few years. A very well-used measure for capturing functional associations in brain is Pearson’s correlation coefficient. Pearson’s correlation is widely used for constructing functional network and studying dynamic functional connectivity of the brain. These are useful measures for understanding the effects of brain disorders on connectivities among brain regions. The fMRI scanners produce huge number of voxels and using traditional central processing unit (CPU)-based techniques for computing pairwise correlations is very time consuming especially when large number of subjects are being studied. In this paper, we propose a graphics processing unit (GPU)-based algorithm called Fast-GPU-PCC for computing pairwise Pearson’s correlation coefficient. Based on the symmetric property of Pearson’s correlation, this approach returns N ( N − 1 ) / 2 correlation coefficients located at strictly upper triangle part of the correlation matrix. Storing correlations in a one-dimensional array with the order as proposed in this paper is useful for further usage. Our experiments on real and synthetic fMRI data for different number of voxels and varying length of time series show that the proposed approach outperformed state of the art GPU-based techniques as well as the sequential CPU-based versions. We show that Fast-GPU-PCC runs 62 times faster than CPU-based version and about 2 to 3 times faster than two other state of the art GPU-based methods.
Collapse
Affiliation(s)
- Taban Eslami
- Department of Computer Science, Western Michigan University, Kalamazoo, MI 49008, USA.
| | - Fahad Saeed
- Department of Computer Science, Western Michigan University, Kalamazoo, MI 49008, USA.
| |
Collapse
|
41
|
Du H, Xia M, Zhao K, Liao X, Yang H, Wang Y, He Y. PAGANI Toolkit: Parallel graph-theoretical analysis package for brain network big data. Hum Brain Mapp 2018; 39:1869-1885. [PMID: 29417688 DOI: 10.1002/hbm.23996] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 12/12/2017] [Accepted: 01/29/2018] [Indexed: 11/10/2022] Open
Abstract
The recent collection of unprecedented quantities of neuroimaging data with high spatial resolution has led to brain network big data. However, a toolkit for fast and scalable computational solutions is still lacking. Here, we developed the PArallel Graph-theoretical ANalysIs (PAGANI) Toolkit based on a hybrid central processing unit-graphics processing unit (CPU-GPU) framework with a graphical user interface to facilitate the mapping and characterization of high-resolution brain networks. Specifically, the toolkit provides flexible parameters for users to customize computations of graph metrics in brain network analyses. As an empirical example, the PAGANI Toolkit was applied to individual voxel-based brain networks with ∼200,000 nodes that were derived from a resting-state fMRI dataset of 624 healthy young adults from the Human Connectome Project. Using a personal computer, this toolbox completed all computations in ∼27 h for one subject, which is markedly less than the 118 h required with a single-thread implementation. The voxel-based functional brain networks exhibited prominent small-world characteristics and densely connected hubs, which were mainly located in the medial and lateral fronto-parietal cortices. Moreover, the female group had significantly higher modularity and nodal betweenness centrality mainly in the medial/lateral fronto-parietal and occipital cortices than the male group. Significant correlations between the intelligence quotient and nodal metrics were also observed in several frontal regions. Collectively, the PAGANI Toolkit shows high computational performance and good scalability for analyzing connectome big data and provides a friendly interface without the complicated configuration of computing environments, thereby facilitating high-resolution connectomics research in health and disease.
Collapse
Affiliation(s)
- Haixiao Du
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - Mingrui Xia
- National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China.,Beijing Key Laboratory of Brain Imaging and Connectomics, Beijing Normal University, Beijing, China.,IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| | - Kang Zhao
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - Xuhong Liao
- National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China.,Beijing Key Laboratory of Brain Imaging and Connectomics, Beijing Normal University, Beijing, China.,IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| | - Huazhong Yang
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - Yu Wang
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - Yong He
- National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China.,Beijing Key Laboratory of Brain Imaging and Connectomics, Beijing Normal University, Beijing, China.,IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| |
Collapse
|
42
|
Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D. Graphics processing units in bioinformatics, computational biology and systems biology. Brief Bioinform 2017; 18:870-885. [PMID: 27402792 PMCID: PMC5862309 DOI: 10.1093/bib/bbw058] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Indexed: 01/18/2023] Open
Abstract
Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools.
Collapse
Affiliation(s)
- Marco S Nobile
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
| | - Paolo Cazzaniga
- Department of Human and Social Sciences, University of Bergamo, Bergamo, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
| | - Andrea Tangherloni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
| | - Daniela Besozzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
- Corresponding author. Daniela Besozzi, Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy and SYSBIO.IT Centre of Systems Biology, Milano, Italy. Tel.: +39 02 6448 7874. E-mail:
| |
Collapse
|
43
|
Pryor A, Ophus C, Miao J. A streaming multi-GPU implementation of image simulation algorithms for scanning transmission electron microscopy. ACTA ACUST UNITED AC 2017; 3:15. [PMID: 29104852 PMCID: PMC5656717 DOI: 10.1186/s40679-017-0048-z] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 10/13/2017] [Indexed: 11/25/2022]
Abstract
Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. Here, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditional multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic.
Collapse
Affiliation(s)
- Alan Pryor
- Department of Physics and Astronomy and California NanoSystems Institute, University of California at Los Angeles, Los Angeles, CA 90095 USA
| | - Colin Ophus
- National Center for Electron Microscopy, Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Jianwei Miao
- Department of Physics and Astronomy and California NanoSystems Institute, University of California at Los Angeles, Los Angeles, CA 90095 USA
| |
Collapse
|
44
|
Abstract
Background The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) represents one of the most effective tools to reconstruct gene regulatory networks from large-scale molecular profile datasets. However, previous implementations require intensive computing resources and, in some cases, restrict the number of samples that can be used. These issues can be addressed elegantly in a GPU computing framework, where repeated mathematical computation can be done efficiently, but requires extensive redesign to apply parallel computing techniques to the original serial algorithm, involving detailed optimization efforts based on a deep understanding of both hardware and software architecture. Result Here, we present an accelerated parallel implementation of ARACNE (GPU-ARACNE). By taking advantage of multi-level parallelism and the Compute Unified Device Architecture (CUDA) parallel kernel-call library, GPU-ARACNE successfully parallelizes a serial algorithm and simplifies the user experience from multi-step operations to one step. Using public datasets on comparable hardware configurations, we showed that GPU-ARACNE is faster than previous implementations and is able to reconstruct equally valid gene regulatory networks. Conclusion Given that previous versions of ARACNE are extremely resource demanding, either in computational time or in hardware investment, GPU-ARACNE is remarkably valuable for researchers who need to build complex regulatory networks from large expression datasets, but with limited budget on computational resources. In addition, our GPU-centered optimization of adaptive partitioning for Mutual Information (MI) estimation provides lessons that are applicable to other domains. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0458-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jing He
- Department of Biomedical Informatics, Columbia University, 168th Street, New York, 10032, NY, USA.,Department of Systems Biology, 1130 St Nicholas Street, New York, 10032, NY, USA
| | - Zhou Zhou
- Department of Computer Science, New York, 10027, NY, USA
| | - Michael Reed
- Department of Computer Science, New York, 10027, NY, USA
| | - Andrea Califano
- Department of Systems Biology, 1130 St Nicholas Street, New York, 10032, NY, USA.
| |
Collapse
|
45
|
Wei JD, Cheng HJ, Lin CY, Ye J, Yeh KY. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments. Evol Bioinform Online 2017; 13:1176934317724764. [PMID: 28835734 PMCID: PMC5555494 DOI: 10.1177/1176934317724764] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 07/12/2017] [Indexed: 11/20/2022] Open
Abstract
High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.
Collapse
Affiliation(s)
- Jyh-Da Wei
- Department of Computer Science and Information Engineering, School of Electrical and Computer Engineering, College of Engineering, Chang Gung University, Taoyuan, Taiwan.,Department of Ophthalmology, Chang Gung Memorial Hospital, Keelung, Taiwan
| | - Hui-Jun Cheng
- Department of Computer Science and Information Engineering, School of Electrical and Computer Engineering, College of Engineering, Chang Gung University, Taoyuan, Taiwan
| | - Chun-Yuan Lin
- Department of Computer Science and Information Engineering, School of Electrical and Computer Engineering, College of Engineering, Chang Gung University, Taoyuan, Taiwan
| | - Jin Ye
- Department of Computer Science and Information Engineering, School of Electrical and Computer Engineering, College of Engineering, Chang Gung University, Taoyuan, Taiwan
| | - Kuan-Yu Yeh
- Department of Computer Science and Information Engineering, School of Electrical and Computer Engineering, College of Engineering, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
46
|
Chang HH, Chang YN. CUDA-based acceleration and BPN-assisted automation of bilateral filtering for brain MR image restoration. Med Phys 2017; 44:1420-1436. [PMID: 28196280 DOI: 10.1002/mp.12157] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 02/02/2017] [Accepted: 02/08/2017] [Indexed: 11/11/2022] Open
Abstract
PURPOSE Bilateral filters have been substantially exploited in numerous magnetic resonance (MR) image restoration applications for decades. Due to the deficiency of theoretical basis on the filter parameter setting, empirical manipulation with fixed values and noise variance-related adjustments has generally been employed. The outcome of these strategies is usually sensitive to the variation of the brain structures and not all the three parameter values are optimal. This article is in an attempt to investigate the optimal setting of the bilateral filter, from which an accelerated and automated restoration framework is developed. METHODS To reduce the computational burden of the bilateral filter, parallel computing with the graphics processing unit (GPU) architecture is first introduced. The NVIDIA Tesla K40c GPU with the compute unified device architecture (CUDA) functionality is specifically utilized to emphasize thread usages and memory resources. To correlate the filter parameters with image characteristics for automation, optimal image texture features are subsequently acquired based on the sequential forward floating selection (SFFS) scheme. Subsequently, the selected features are introduced into the back propagation network (BPN) model for filter parameter estimation. Finally, the k-fold cross validation method is adopted to evaluate the accuracy of the proposed filter parameter prediction framework. RESULTS A wide variety of T1-weighted brain MR images with various scenarios of noise levels and anatomic structures were utilized to train and validate this new parameter decision system with CUDA-based bilateral filtering. For a common brain MR image volume of 256 × 256 × 256 pixels, the speed-up gain reached 284. Six optimal texture features were acquired and associated with the BPN to establish a "high accuracy" parameter prediction system, which achieved a mean absolute percentage error (MAPE) of 5.6%. Automatic restoration results on 2460 brain MR images received an average relative error in terms of peak signal-to-noise ratio (PSNR) less than 0.1%. In comparison with many state-of-the-art filters, the proposed automation framework with CUDA-based bilateral filtering provided more favorable results both quantitatively and qualitatively. CONCLUSIONS Possessing unique characteristics and demonstrating exceptional performances, the proposed CUDA-based bilateral filter adequately removed random noise in multifarious brain MR images for further study in neurosciences and radiological sciences. It requires no prior knowledge of the noise variance and automatically restores MR images while preserving fine details. The strategy of exploiting the CUDA to accelerate the computation and incorporating texture features into the BPN to completely automate the bilateral filtering process is achievable and validated, from which the best performance is reached.
Collapse
Affiliation(s)
- Herng-Hua Chang
- Computational Biomedical Engineering Laboratory (CBEL), Department of Engineering Science and Ocean Engineering, National Taiwan University, Taipei, 10617, Taiwan
| | - Yu-Ning Chang
- Computational Biomedical Engineering Laboratory (CBEL), Department of Engineering Science and Ocean Engineering, National Taiwan University, Taipei, 10617, Taiwan
| |
Collapse
|
47
|
Abstract
BACKGROUND Metagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed. RESULTS We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method. Using the processing power of a single Titan X GPU, cuCLARK can reach classification speeds of up to 50 million reads per minute. Corresponding speedups for species- (genus-)level classification range between 3.2 and 6.6 (3.7 and 6.4) compared to multi-threaded CLARK executed on a 16-core Xeon CPU workstation. CONCLUSION cuCLARK can perform metagenomic read classification at superior speeds on CUDA-enabled GPUs. It is free software licensed under GPL and can be downloaded at https://github.com/funatiq/cuclark free of charge.
Collapse
Affiliation(s)
- Robin Kobus
- Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9, Mainz, 55435 Germany
| | - Christian Hundt
- Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9, Mainz, 55435 Germany
| | - André Müller
- Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9, Mainz, 55435 Germany
| | - Bertil Schmidt
- Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9, Mainz, 55435 Germany
| |
Collapse
|
48
|
Abstract
Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.
Collapse
Affiliation(s)
- Yichao Zhou
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, P. R. China
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC, USA
- Department of Biochemistry, Duke University Medical Center, Durham, NC, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, P. R. China.
| |
Collapse
|
49
|
Techavipoo U, Worasawate D, Boonleelakul W, Keinprasit R, Sunpetchniyom T, Sugino N, Thajchayapong P. Toward Optimal Computation of Ultrasound Image Reconstruction Using CPU and GPU. Sensors (Basel) 2016; 16:E1986. [PMID: 27886149 DOI: 10.3390/s16121986] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 10/31/2016] [Accepted: 11/10/2016] [Indexed: 12/03/2022]
Abstract
An ultrasound image is reconstructed from echo signals received by array elements of a transducer. The time of flight of the echo depends on the distance between the focus to the array elements. The received echo signals have to be delayed to make their wave fronts and phase coherent before summing the signals. In digital beamforming, the delays are not always located at the sampled points. Generally, the values of the delayed signals are estimated by the values of the nearest samples. This method is fast and easy, however inaccurate. There are other methods available for increasing the accuracy of the delayed signals and, consequently, the quality of the beamformed signals; for example, the in-phase (I)/quadrature (Q) interpolation, which is more time consuming but provides more accurate values than the nearest samples. This paper compares the signals after dynamic receive beamforming, in which the echo signals are delayed using two methods, the nearest sample method and the I/Q interpolation method. The comparisons of the visual qualities of the reconstructed images and the qualities of the beamformed signals are reported. Moreover, the computational speeds of these methods are also optimized by reorganizing the data processing flow and by applying the graphics processing unit (GPU). The use of single and double precision floating-point formats of the intermediate data is also considered. The speeds with and without these optimizations are also compared.
Collapse
|
50
|
Abstract
Background During library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computational effort. To deal with this challenge, we recently proposed a GPU method aimed at removing identical and nearly-identical duplicates generated with an Illumina platform. The method implements an approach based on prefix-suffix comparison. Read sequences with identical prefix are considered potential duplicates. Then, their suffixes are compared to identify and remove those that are actually duplicated. Although the method can be efficiently used to remove duplicates, there are some limitations that need to be overcome. In particular, it cannot to detect potential duplicates in the event that prefixes are longer than 27 bases, and it does not provide support for paired-end read libraries. Moreover, large clusters of potential duplicates are split into smaller with the aim to guarantees a reasonable computing time. This heuristic may affect the accuracy of the analysis. Results In this work we propose GPU-DupRemoval, a new implementation of our method able to (i) cluster reads without constraints on the maximum length of the prefixes, (ii) support both single- and paired-end read libraries, and (iii) analyze large clusters of potential duplicates. Conclusions Due to the massive parallelization obtained by exploiting graphics cards, GPU-DupRemoval removes duplicate reads faster than other cutting-edge solutions, while outperforming most of them in terms of amount of duplicates reads.
Collapse
Affiliation(s)
- Andrea Manconi
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy.
| | - Marco Moscatelli
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy
| | - Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari, P.zza D'Armi, Cagliari (CA), 09123, Italy
| | - Matteo Gnocchi
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy
| | - Alessandro Orro
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy
| | - Luciano Milanesi
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy
| |
Collapse
|