1
|
Perek P, Mielczarek A, Makowski D. High-Performance Image Acquisition and Processing for Stereoscopic Diagnostic Systems with the Application of Graphical Processing Units. Sensors (Basel) 2022; 22:s22020471. [PMID: 35062431 PMCID: PMC8777855 DOI: 10.3390/s22020471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/05/2022] [Accepted: 01/06/2022] [Indexed: 01/27/2023]
Abstract
In recent years, cinematography and other digital content creators have been eagerly turning to Three-Dimensional (3D) imaging technology. The creators of movies, games, and augmented reality applications are aware of this technology's advantages, possibilities, and new means of expression. The development of electronic and IT technologies enables the achievement of a better and better quality of the recorded 3D image and many possibilities for its correction and modification in post-production. However, preparing a correct 3D image that does not cause perception problems for the viewer is still a complex and demanding task. Therefore, planning and then ensuring the correct parameters and quality of the recorded 3D video is essential. Despite better post-production techniques, fixing errors in a captured image can be difficult, time consuming, and sometimes impossible. The detection of errors typical for stereo vision related to the depth of the image (e.g., depth budget violation, stereoscopic window violation) during the recording allows for their correction already on the film set, e.g., by different scene layouts and/or different camera configurations. The paper presents a prototype of an independent, non-invasive diagnostic system that supports the film crew in the process of calibrating stereoscopic cameras, as well as analysing the 3D depth while working on a film set. The system acquires full HD video streams from professional cameras using Serial Digital Interface (SDI), synchronises them, and estimates and analyses the disparity map. Objective depth analysis using computer tools while recording scenes allows stereographers to immediately spot errors in the 3D image, primarily related to the violation of the viewing comfort zone. The paper also describes an efficient method of analysing a 3D video using Graphics Processing Unit (GPU). The main steps of the proposed solution are uncalibrated rectification and disparity map estimation. The algorithms selected and implemented for the needs of this system do not require knowledge of intrinsic and extrinsic camera parameters. Thus, they can be used in non-cooperative environments, such as a film set, where the camera configuration often changes. Both of them are implemented with the use of a GPU to improve the data processing efficiency. The paper presents the evaluation results of the algorithms' accuracy, as well as the comparison of the performance of two implementations-with and without the GPU acceleration. The application of the described GPU-based method makes the system efficient and easy to use. The system can process a video stream with full HD resolution at a speed of several frames per second.
Collapse
|
2
|
Thiyagarajan P, Padmanaban S, Thiruvenkadam K, Karuppanagounder S. Advancements of MRI-Based Brain Tumor Segmentation from Traditional to Recent Trends- A Review. Curr Med Imaging 2021; 18:1261-1275. [PMID: 34911430 DOI: 10.2174/1573405617666211215111937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 08/20/2021] [Accepted: 09/13/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Among the brain-related diseases, brain tumor segmentation on magnetic resonance imaging (MRI) scans is one of the highly focused research domains in the medical community. Brain tumor segmentation is a very challenging task due to its asymmetric form and uncertain boundaries. This process segregates the tumor region into the active tumor, necrosis and edema from normal brain tissues such as white matter (WM), grey matter (GM), and cerebrospinal fluid (CSF). INTRODUCTION The proposed paper analyzed the advancement of brain tumor segmentation from conventional image processing techniques, to deep learning through machine learning on MRI of human head scans. METHOD State-of-the-art methods of these three techniques are investigated, and the merits and demerits are discussed. RESULTS The prime motivation of the paper is to instigate the young researchers towards the development of efficient brain tumor segmentation techniques using conventional and recent technologies. CONCLUSION The proposed analysis concluded that the conventional and machine learning methods were mostly applied for brain tumor detection, whereas deep learning methods were good at tumor substructures segmentation.
Collapse
Affiliation(s)
- Padmapriya Thiyagarajan
- Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Gandhigram 624 302, Tamil Nadu. India
| | - Sriramakrishnan Padmanaban
- Department of Computer Applications, Kalasalingam Academy of Research and Education (Deemed to be University), Krishnankoil 626128, Tamil Nadu. India
| | - Kalaiselvi Thiruvenkadam
- Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Gandhigram 624 302, Tamil Nadu. India
| | - Somasundaram Karuppanagounder
- Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Gandhigram 624 302, Tamil Nadu. India
| |
Collapse
|
3
|
Roberge V, Tarbouchi M. Parallel Algorithm on GPU for Wireless Sensor Data Acquisition Using a Team of Unmanned Aerial Vehicles. Sensors (Basel) 2021; 21:6851. [PMID: 34696064 DOI: 10.3390/s21206851] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/30/2021] [Accepted: 10/12/2021] [Indexed: 11/16/2022]
Abstract
This paper proposes a framework for the wireless sensor data acquisition using a team of Unmanned Aerial Vehicles (UAVs). Scattered over a terrain, the sensors detect information about their surroundings and can transmit this information wirelessly over a short range. With no access to a terrestrial or satellite communication network to relay the information to, UAVs are used to visit the sensors and collect the data. The proposed framework uses an iterative k-means algorithm to group the sensors into clusters and to identify Download Points (DPs) where the UAVs hover to download the data. A Single-Source–Shortest-Path algorithm (SSSP) is used to compute optimal paths between every pair of DPs with a constraint to reduce the number of turns. A genetic algorithm supplemented with a 2-opt local search heuristic is used to solve the multi-travelling salesperson problem and to find optimized tours for each UAVs. Finally, a collision avoidance strategy is implemented to guarantee collision-free trajectories. Concerned with the overall runtime of the framework, the SSSP algorithm is implemented in parallel on a graphics processing unit. The proposed framework is tested in simulation using three UAVs and realistic 3D maps with up to 100 sensors and runs in just 20.7 s, a 33.3× speed-up compared to a sequential execution on CPU. The results show that the proposed method is efficient at calculating optimized trajectories for the UAVs for data acquisition from wireless sensors. The results also show the significant advantage of the parallel implementation on GPU.
Collapse
|
4
|
Zhang B, Fan Z, Zhao CY, Gu X. GPU_PBTE: an efficient solver for three and four phonon scattering rates on graphics processing units. J Phys Condens Matter 2021; 33:495901. [PMID: 34521073 DOI: 10.1088/1361-648x/ac268d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/14/2021] [Indexed: 06/13/2023]
Abstract
Lattice thermal conductivity (LTC) is a key parameter for many technological applications. Based on the Peierls-Boltzmann transport equation (PBTE), many unique phonon transport properties of various materials were revealed. Accurate calculation of LTC with PBTE, however, is a time-consuming task, especially for compounds with a complex crystal structure or taking high-order phonon scattering into consideration. Graphical processing units (GPUs) have been extensively used to accelerate scientific simulations, making it possible to use a single desktop workstation for calculations that used to require supercomputers. Due to its fundamental differences from traditional processors, GPUs are especially suited for executing a large group of similar tasks with minimal communication, but require completely different algorithm design. In this paper, we provide a new algorithm optimized for GPUs, where a two-kernel method is used to avoid divergent branching. A new open-source code, GPU_PBTE, is developed based on the proposed algorithm. As demonstrations, we investigate the thermal transport properties of silicon and silicon carbide, and find that accurate and reliable LTC can be obtained by our software. GPU_PBTE performed on NVIDIA Tesla V100 can extensively improve double precision performance, making it two to three orders of magnitude faster than our CPU version performed on Intel Xeon CPU Gold 6248 @2.5 GHz. Our work also provides an idea of accelerating calculations with other novel hardware that may come out in the future.
Collapse
Affiliation(s)
- Bo Zhang
- Institute of Engineering Thermophysics, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, People's Republic of China
| | - Zheyong Fan
- School of Mathematics and Physics, Bohai University, Jinzhou, People's Republic of China
| | - C Y Zhao
- Institute of Engineering Thermophysics, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, People's Republic of China
| | - Xiaokun Gu
- Institute of Engineering Thermophysics, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, People's Republic of China
| |
Collapse
|
5
|
Abstract
Imaging systems are often modeled as continuous-to-discrete mappings that map the object (i.e. a function of continuous variables such as space, time, energy, wavelength, etc) to a finite set of measurements. When it comes to reconstruction, some discretized version of the object is almost always assumed, leading to a discrete-to-discrete representation of the imaging system. In this paper, we discuss a method for single-photon emission computed tomography (SPECT) imaging that avoids discrete representations of the object or the imaging system, thus allowing reconstruction on an arbitrarily fine set of points.
Collapse
Affiliation(s)
- L Caucci
- Department of Medical Imaging, University of Arizona, Tucson, AZ 85724, United States of America. College of Optical Sciences, University of Arizona, Tucson, AZ 85719, United States of America. Author to whom any correspondence should be addressed
| | | | | | | | | | | |
Collapse
|
6
|
Morrison AF, Epifanovsky E, Herbert JM. Double-buffered, heterogeneous CPU + GPU integral digestion algorithm for single-excitation calculations involving a large number of excited states. J Comput Chem 2018; 39:2173-2182. [PMID: 30368836 DOI: 10.1002/jcc.25531] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Revised: 05/12/2018] [Accepted: 06/14/2018] [Indexed: 01/29/2023]
Abstract
The most widely used quantum-chemical models for excited states are single-excitation theories, a category that includes configuration interaction with single substitutions, time-dependent density functional theory, and also a recently developed ab initio exciton model. When a large number of excited states are desired, these calculations incur a significant bottleneck in the "digestion" step in which two-electron integrals are contracted with density or density-like matrices. We present an implementation that moves this step onto graphical processing units (GPUs), and introduce a double-buffer scheme that minimizes latency by computing integrals on the central processing units (CPUs) concurrently with their digestion on the GPUs. An automatic code generation scheme simplifies the implementation of high-performance GPU kernels. For the exciton model, which requires separate excited-state calculations on each electronically coupled chromophore, the heterogeneous implementation described here results in speedups of 2-6× versus a CPU-only implementation. For traditional time-dependent density functional theory calculations, we obtain speedups of up to 5× when a large number of excited states is computed. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Adrian F Morrison
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio.,Q-Chem Inc., Pleasanton, California
| | | | - John M Herbert
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio
| |
Collapse
|
7
|
Florimbi G, Fabelo H, Torti E, Lazcano R, Madroñal D, Ortega S, Salvador R, Leporati F, Danese G, Báez-Quevedo A, Callicó GM, Juárez E, Sanz C, Sarmiento R. Accelerating the K-Nearest Neighbors Filtering Algorithm to Optimize the Real-Time Classification of Human Brain Tumor in Hyperspectral Images. Sensors (Basel) 2018; 18:s18072314. [PMID: 30018216 PMCID: PMC6068477 DOI: 10.3390/s18072314] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 07/12/2018] [Accepted: 07/15/2018] [Indexed: 11/19/2022]
Abstract
The use of hyperspectral imaging (HSI) in the medical field is an emerging approach to assist physicians in diagnostic or surgical guidance tasks. However, HSI data processing involves very high computational requirements due to the huge amount of information captured by the sensors. One of the stages with higher computational load is the K-Nearest Neighbors (KNN) filtering algorithm. The main goal of this study is to optimize and parallelize the KNN algorithm by exploiting the GPU technology to obtain real-time processing during brain cancer surgical procedures. This parallel version of the KNN performs the neighbor filtering of a classification map (obtained from a supervised classifier), evaluating the different classes simultaneously. The undertaken optimizations and the computational capabilities of the GPU device throw a speedup up to 66.18× when compared to a sequential implementation.
Collapse
Affiliation(s)
- Giordana Florimbi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy.
| | - Himar Fabelo
- Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35017 Las Palmas de Gran Canaria, Spain.
| | - Emanuele Torti
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy.
| | - Raquel Lazcano
- Centre of Software Technologies and Multimedia Systems (CITSEM), Technical University of Madrid (UPM), 28031 Madrid, Spain.
| | - Daniel Madroñal
- Centre of Software Technologies and Multimedia Systems (CITSEM), Technical University of Madrid (UPM), 28031 Madrid, Spain.
| | - Samuel Ortega
- Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35017 Las Palmas de Gran Canaria, Spain.
| | - Ruben Salvador
- Centre of Software Technologies and Multimedia Systems (CITSEM), Technical University of Madrid (UPM), 28031 Madrid, Spain.
| | - Francesco Leporati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy.
| | - Giovanni Danese
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy.
| | - Abelardo Báez-Quevedo
- Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35017 Las Palmas de Gran Canaria, Spain.
| | - Gustavo M Callicó
- Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35017 Las Palmas de Gran Canaria, Spain.
| | - Eduardo Juárez
- Centre of Software Technologies and Multimedia Systems (CITSEM), Technical University of Madrid (UPM), 28031 Madrid, Spain.
| | - César Sanz
- Centre of Software Technologies and Multimedia Systems (CITSEM), Technical University of Madrid (UPM), 28031 Madrid, Spain.
| | - Roberto Sarmiento
- Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35017 Las Palmas de Gran Canaria, Spain.
| |
Collapse
|
8
|
Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D. Graphics processing units in bioinformatics, computational biology and systems biology. Brief Bioinform 2017; 18:870-885. [PMID: 27402792 PMCID: PMC5862309 DOI: 10.1093/bib/bbw058] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Indexed: 01/18/2023] Open
Abstract
Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools.
Collapse
Affiliation(s)
- Marco S Nobile
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
| | - Paolo Cazzaniga
- Department of Human and Social Sciences, University of Bergamo, Bergamo, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
| | - Andrea Tangherloni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
| | - Daniela Besozzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy
- SYSBIO.IT Centre of Systems Biology, Milano, Italy
- Corresponding author. Daniela Besozzi, Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy and SYSBIO.IT Centre of Systems Biology, Milano, Italy. Tel.: +39 02 6448 7874. E-mail:
| |
Collapse
|
9
|
Matenine D, Côté G, Mascolo-Fortin J, Goussard Y, Després P. System matrix computation vs storage on GPU: A comparative study in cone beam CT. Med Phys 2017; 45:579-588. [PMID: 29214631 DOI: 10.1002/mp.12714] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Revised: 11/01/2017] [Accepted: 11/19/2017] [Indexed: 11/10/2022] Open
Abstract
PURPOSE Iterative reconstruction algorithms in computed tomography (CT) require a fast method for computing the intersection distances between the trajectories of photons and the object, also called ray tracing or system matrix computation. This work focused on the thin-ray model is aimed at comparing different system matrix handling strategies using graphical processing units (GPUs). METHODS In this work, the system matrix is modeled by thin rays intersecting a regular grid of box-shaped voxels, known to be an accurate representation of the forward projection operator in CT. However, an uncompressed system matrix exceeds the random access memory (RAM) capacities of typical computers by one order of magnitude or more. Considering the RAM limitations of GPU hardware, several system matrix handling methods were compared: full storage of a compressed system matrix, on-the-fly computation of its coefficients, and partial storage of the system matrix with partial on-the-fly computation. These methods were tested on geometries mimicking a cone beam CT (CBCT) acquisition of a human head. Execution times of three routines of interest were compared: forward projection, backprojection, and ordered-subsets convex (OSC) iteration. RESULTS A fully stored system matrix yielded the shortest backprojection and OSC iteration times, with a 1.52× acceleration for OSC when compared to the on-the-fly approach. Nevertheless, the maximum problem size was bound by the available GPU RAM and geometrical symmetries. On-the-fly coefficient computation did not require symmetries and was shown to be the fastest for forward projection. It also offered reasonable execution times of about 176.4 ms per view per OSC iteration for a detector of 512 × 448 pixels and a volume of 3843 voxels, using commodity GPU hardware. Partial system matrix storage has shown a performance similar to the on-the-fly approach, while still relying on symmetries. CONCLUSION Partial system matrix storage was shown to yield the lowest relative performance. On-the-fly ray tracing was shown to be the most flexible method, yielding reasonable execution times. A fully stored system matrix allowed for the lowest backprojection and OSC iteration times and may be of interest for certain performance-oriented applications.
Collapse
Affiliation(s)
- Dmitri Matenine
- Département de physique, de génie physique et d'optique, Université Laval, Québec, Québec, G1V 0A6, Canada
| | - Geoffroi Côté
- Département de physique, de génie physique et d'optique, Université Laval, Québec, Québec, G1V 0A6, Canada
| | - Julia Mascolo-Fortin
- Département de physique, de génie physique et d'optique, Université Laval, Québec, Québec, G1V 0A6, Canada
| | - Yves Goussard
- Institut de génie biomédical, Département de génie électrique, École Polytechnique de Montréal, C.P. 6079, succ. Centre-ville, Montréal, Québec, H3C 3A7, Canada
| | - Philippe Després
- Département de physique, de génie physique et d'optique and Centre de recherche sur le cancer, Université Laval, Québec, Québec, G1V 0A6, Canada.,Département de radio-oncologie and Centre de recherche du CHU de Québec, Québec, Québec, G1R 2J6, Canada
| |
Collapse
|
10
|
Harger M, Li D, Wang Z, Dalby K, Lagardère L, Piquemal JP, Ponder J, Ren P. Tinker-OpenMM: Absolute and relative alchemical free energies using AMOEBA on GPUs. J Comput Chem 2017; 38:2047-2055. [PMID: 28600826 DOI: 10.1002/jcc.24853] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Accepted: 05/06/2017] [Indexed: 12/27/2022]
Abstract
The capabilities of the polarizable force fields for alchemical free energy calculations have been limited by the high computational cost and complexity of the underlying potential energy functions. In this work, we present a GPU-based general alchemical free energy simulation platform for polarizable potential AMOEBA. Tinker-OpenMM, the OpenMM implementation of the AMOEBA simulation engine has been modified to enable both absolute and relative alchemical simulations on GPUs, which leads to a ∼200-fold improvement in simulation speed over a single CPU core. We show that free energy values calculated using this platform agree with the results of Tinker simulations for the hydration of organic compounds and binding of host-guest systems within the statistical errors. In addition to absolute binding, we designed a relative alchemical approach for computing relative binding affinities of ligands to the same host, where a special path was applied to avoid numerical instability due to polarization between the different ligands that bind to the same site. This scheme is general and does not require ligands to have similar scaffolds. We show that relative hydration and binding free energy calculated using this approach match those computed from the absolute free energy approach. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Matthew Harger
- Department of Biomedical Engineering, University of Texas at Austin, Austin, Texas, 78712
| | - Daniel Li
- Department of Biomedical Engineering, University of Texas at Austin, Austin, Texas, 78712
| | - Zhi Wang
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri, 63130
| | - Kevin Dalby
- Division of Chemical Biology and Medicinal Chemistry, University of Texas at Austin, Austin, Texas, 78712
| | - Louis Lagardère
- Institut des Sciences du Calcul et des Données, UPMC Université Paris 06, F-75005, Paris, France
| | - Jean-Philip Piquemal
- Laboratoire de Chimie Théorique, Sorbonne Universités, UPMC, UMR7616 CNRS, Paris, France.,Institut Universitaire de France, Paris Cedex 05, 75231, France
| | - Jay Ponder
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri, 63130
| | - Pengyu Ren
- Department of Biomedical Engineering, University of Texas at Austin, Austin, Texas, 78712
| |
Collapse
|
11
|
Lange M, Palamara S, Lassila T, Vergara C, Quarteroni A, Frangi AF. Improved hybrid/GPU algorithm for solving cardiac electrophysiology problems on Purkinje networks. Int J Numer Method Biomed Eng 2017; 33:e2835. [PMID: 27661463 DOI: 10.1002/cnm.2835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 09/15/2016] [Indexed: 06/06/2023]
Abstract
Cardiac Purkinje fibers provide an important pathway to the coordinated contraction of the heart. We present a numerical algorithm for the solution of electrophysiology problems across the Purkinje network that is efficient enough to be used in in silico studies on realistic Purkinje networks with physiologically detailed models of ion exchange at the cell membrane. The algorithm is on the basis of operator splitting and is provided with 3 different implementations: pure CPU, hybrid CPU/GPU, and pure GPU. Compared to our previous work, we modify the explicit gap junction term at network bifurcations to improve its mathematical consistency. Due to this improved consistency of the model, we are able to perform an empirical convergence study against analytical solutions. The study verified that all 3 implementations produce equivalent convergence rates, and shows that the algorithm produces equivalent result across different hardware platforms. Finally, we compare the efficiency of all 3 implementations on Purkinje networks of increasing spatial resolution using membrane models of increasing complexity. Both hybrid and pure GPU implementations outperform the pure CPU implementation, but their relative performance difference depends on the size of the Purkinje network and the complexity of the membrane model used.
Collapse
Affiliation(s)
- M Lange
- CISTIB, Department of Electronic and Electrical Engineering, The University of Sheffield, UK
| | - S Palamara
- MOX, Dipartimento di Matematica, Politecnico di Milano, Italy
| | - T Lassila
- CISTIB, Department of Electronic and Electrical Engineering, The University of Sheffield, UK
| | - C Vergara
- MOX, Dipartimento di Matematica, Politecnico di Milano, Italy
| | - A Quarteroni
- CMCS, Mathematics Institute of Computational Science and Engineering, École Polytechnique Fédérale de Lausanne, Switzerland
| | - A F Frangi
- CISTIB, Department of Electronic and Electrical Engineering, The University of Sheffield, UK
| |
Collapse
|
12
|
Abstract
We apply multireference electronic structure calculations to demonstrate the presence of conical intersections between the ground and the first excited electronic states of three silicon nanocrystals containing defects characteristic of the oxidized silicon surface. These intersections are accessible upon excitation at visible wavelengths and are predicted to facilitate nonradiative recombination with a rate that increases with decreasing particle size. This work illustrates a new framework for identifying defects responsible for nonradiative recombination.
Collapse
Affiliation(s)
- Yinan Shu
- Department of Chemistry, Michigan State University , East Lansing, Michigan 48824, United States
| | - B Scott Fales
- Department of Chemistry, Michigan State University , East Lansing, Michigan 48824, United States
| | - Benjamin G Levine
- Department of Chemistry, Michigan State University , East Lansing, Michigan 48824, United States
| |
Collapse
|
13
|
Xie J, Zhou Z, Ma J, Xiang C, Nie Q, Zhang W. Graphics processing unit-based alignment of protein interaction networks. IET Syst Biol 2015; 9:120-7. [PMID: 26243827 PMCID: PMC8687428 DOI: 10.1049/iet-syb.2014.0052] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 01/23/2015] [Accepted: 03/03/2015] [Indexed: 11/19/2022] Open
Abstract
Network alignment is an important bridge to understanding human protein-protein interactions (PPIs) and functions through model organisms. However, the underlying subgraph isomorphism problem complicates and increases the time required to align protein interaction networks (PINs). Parallel computing technology is an effective solution to the challenge of aligning large-scale networks via sequential computing. In this study, the typical Hungarian-Greedy Algorithm (HGA) is used as an example for PIN alignment. The authors propose a HGA with 2-nearest neighbours (HGA-2N) and implement its graphics processing unit (GPU) acceleration. Numerical experiments demonstrate that HGA-2N can find alignments that are close to those found by HGA while dramatically reducing computing time. The GPU implementation of HGA-2N optimises the parallel pattern, computing mode and storage mode and it improves the computing time ratio between the CPU and GPU compared with HGA when large-scale networks are considered. By using HGA-2N in GPUs, conserved PPIs can be observed, and potential PPIs can be predicted. Among the predictions based on 25 common Gene Ontology terms, 42.8% can be found in the Human Protein Reference Database. Furthermore, a new method of reconstructing phylogenetic trees is introduced, which shows the same relationships among five herpes viruses that are obtained using other methods.
Collapse
Affiliation(s)
- Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, People's Republic of China.
| | - Zhonghua Zhou
- School of Computer Engineering and Science, Shanghai University, Shanghai, People's Republic of China
| | - Jin Ma
- School of Computer Engineering and Science, Shanghai University, Shanghai, People's Republic of China
| | - Chaojuan Xiang
- School of Computer Engineering and Science, Shanghai University, Shanghai, People's Republic of China
| | - Qing Nie
- Department of Mathematics, Center for Mathematical and Computational Biology, University of California at Irvine, California, USA
| | - Wu Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, People's Republic of China
| |
Collapse
|
14
|
Kazachenko S, Giovinazzo M, Hall KW, Cann NM. Algorithms for GPU-based molecular dynamics simulations of complex fluids: Applications to water, mixtures, and liquid crystals. J Comput Chem 2015; 36:1787-804. [PMID: 26174435 DOI: 10.1002/jcc.24000] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Revised: 04/24/2015] [Accepted: 06/07/2015] [Indexed: 11/11/2022]
Abstract
A custom code for molecular dynamics simulations has been designed to run on CUDA-enabled NVIDIA graphics processing units (GPUs). The double-precision code simulates multicomponent fluids, with intramolecular and intermolecular forces, coarse-grained and atomistic models, holonomic constraints, Nosé-Hoover thermostats, and the generation of distribution functions. Algorithms to compute Lennard-Jones and Gay-Berne interactions, and the electrostatic force using Ewald summations, are discussed. A neighbor list is introduced to improve scaling with respect to system size. Three test systems are examined: SPC/E water; an n-hexane/2-propanol mixture; and a liquid crystal mesogen, 2-(4-butyloxyphenyl)-5-octyloxypyrimidine. Code performance is analyzed for each system. With one GPU, a 33-119 fold increase in performance is achieved compared with the serial code while the use of two GPUs leads to a 69-287 fold improvement and three GPUs yield a 101-377 fold speedup.
Collapse
Affiliation(s)
- Sergey Kazachenko
- Department of Chemistry, Queen's University, Kingston, Ontario, K7L 3N6, Canada
| | - Mark Giovinazzo
- Department of Mechanical and Materials Engineering, Queen's University, Kingston, Ontario, K7L 3N6, Canada
| | - Kyle Wm Hall
- Department of Chemistry, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| | - Natalie M Cann
- Department of Chemistry, Queen's University, Kingston, Ontario, K7L 3N6, Canada
| |
Collapse
|
15
|
Abstract
PURPOSE To develop a fast patient-specific analytical estimator of first-order Compton and Rayleigh scatter in cone-beam computed tomography, implemented using graphics processing units. METHODS The authors developed an analytical estimator for first-order Compton and Rayleigh scatter in a cone-beam computed tomography geometry. The estimator was coded using NVIDIA's CUDA environment for execution on an NVIDIA graphics processing unit. Performance of the analytical estimator was validated by comparison with high-count Monte Carlo simulations for two different numerical phantoms. Monoenergetic analytical simulations were compared with monoenergetic and polyenergetic Monte Carlo simulations. Analytical and Monte Carlo scatter estimates were compared both qualitatively, from visual inspection of images and profiles, and quantitatively, using a scaled root-mean-square difference metric. Reconstruction of simulated cone-beam projection data of an anthropomorphic breast phantom illustrated the potential of this method as a component of a scatter correction algorithm. RESULTS The monoenergetic analytical and Monte Carlo scatter estimates showed very good agreement. The monoenergetic analytical estimates showed good agreement for Compton single scatter and reasonable agreement for Rayleigh single scatter when compared with polyenergetic Monte Carlo estimates. For a voxelized phantom with dimensions 128 × 128 × 128 voxels and a detector with 256 × 256 pixels, the analytical estimator required 669 seconds for a single projection, using a single NVIDIA 9800 GX2 video card. Accounting for first order scatter in cone-beam image reconstruction improves the contrast to noise ratio of the reconstructed images. CONCLUSION The analytical scatter estimator, implemented using graphics processing units, provides rapid and accurate estimates of single scatter and with further acceleration and a method to account for multiple scatter may be useful for practical scatter correction schemes.
Collapse
Affiliation(s)
- Harry Ingleby
- Division of Medical Physics, CancerCare Manitoba, McDermot Avenue, Winnipeg, Manitoba, Canada Department of Physics and Astronomy, University of Manitoba, Winnipeg, Manitoba, Canada Department of Radiology, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Jonas Lippuner
- Division of Medical Physics, CancerCare Manitoba, McDermot Avenue, Winnipeg, Manitoba, Canada
| | - Daniel W Rickey
- Division of Medical Physics, CancerCare Manitoba, McDermot Avenue, Winnipeg, Manitoba, Canada Department of Physics and Astronomy, University of Manitoba, Winnipeg, Manitoba, Canada Department of Radiology, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Yue Li
- Division of Medical Physics, CancerCare Manitoba, McDermot Avenue, Winnipeg, Manitoba, Canada
| | - Idris Elbakri
- Division of Medical Physics, CancerCare Manitoba, McDermot Avenue, Winnipeg, Manitoba, Canada Department of Physics and Astronomy, University of Manitoba, Winnipeg, Manitoba, Canada Department of Radiology, University of Manitoba, Winnipeg, Manitoba, Canada
| |
Collapse
|
16
|
Abstract
The primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data analytics. This paper surveys different hardware platforms available for big data analytics and assesses the advantages and drawbacks of each of these platforms based on various metrics such as scalability, data I/O rate, fault tolerance, real-time processing, data size supported and iterative task support. In addition to the hardware, a detailed description of the software frameworks used within each of these platforms is also discussed along with their strengths and drawbacks. Some of the critical characteristics described here can potentially aid the readers in making an informed decision about the right choice of platforms depending on their computational needs. Using a star ratings table, a rigorous qualitative comparison between different platforms is also discussed for each of the six characteristics that are critical for the algorithms of big data analytics. In order to provide more insights into the effectiveness of each of the platform in the context of big data analytics, specific implementation level details of the widely used k-means clustering algorithm on various platforms are also described in the form pseudocode.
Collapse
Affiliation(s)
- Dilpreet Singh
- Department of Computer Science, Wayne State University, Detroit, MI 48202 USA
| | - Chandan K Reddy
- Department of Computer Science, Wayne State University, Detroit, MI 48202 USA
| |
Collapse
|
17
|
Anzt H, Quintana-Ortí ES. Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems. Philos Trans A Math Phys Eng Sci 2014; 372:20130279. [PMID: 24842036 DOI: 10.1098/rsta.2013.0279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
While most recent breakthroughs in scientific research rely on complex simulations carried out in large-scale supercomputers, the power draft and energy spent for this purpose is increasingly becoming a limiting factor to this trend. In this paper, we provide an overview of the current status in energy-efficient scientific computing by reviewing different technologies used to monitor power draft as well as power- and energy-saving mechanisms available in commodity hardware. For the particular domain of sparse linear algebra, we analyse the energy efficiency of a broad collection of hardware architectures and investigate how algorithmic and implementation modifications can improve the energy performance of sparse linear system solvers, without negatively impacting their performance.
Collapse
Affiliation(s)
- H Anzt
- Innovative Computing Laboratory (ICL), University of Tennessee at Knoxville, Knoxville, TN 37996, USA
| | - E S Quintana-Ortí
- Departamento de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain
| |
Collapse
|
18
|
Miller BW, Van Holen R, Barrett HH, Furenlid LR. A System Calibration and Fast Iterative Reconstruction Method for Next-Generation SPECT Imagers. IEEE Nucl Sci Symp Conf Rec (1997) 2011; 2011:3548-3553. [PMID: 26568672 DOI: 10.1109/nssmic.2011.6153666] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Recently, high-resolution gamma cameras have been developed with detectors containing> 105-106 elements. SPECT imagers based on these detectors usually also have a large number of voxel bins and therefore face memory storage issues for the system matrix when performing fast tomographic reconstructions using iterative algorithms. To address these issues, we have developed a method that parameterizes the detector response to a point source and generates the system matrix on the fly during MLEM or OSEM on graphics hardware. The calibration method, interpolation of coefficient data, and reconstruction results are presented in the context of a recently commissioned small-animal SPECT imager, called FastSPECT III.
Collapse
Affiliation(s)
- Brian W Miller
- Center for Gamma-Ray Imaging and the College of Optical Sciences, University of Arizona, Tucson, AZ 85724 USA
| | - Roel Van Holen
- MEDISIP, Department of Electronics and Information Systems, Ghent University, B-9000 Ghent, Belgium. He is supported by a postdoctoral fellowship of the Research Foundation (FWO)
| | - Harrison H Barrett
- Center for Gamma-Ray Imaging and the College of Optical Sciences, University of Arizona, Tucson, AZ 85724 USA
| | - Lars R Furenlid
- Center for Gamma-Ray Imaging and the College of Optical Sciences, University of Arizona, Tucson, AZ 85724 USA
| |
Collapse
|
19
|
Abstract
Positron emission tomography systems are best described by a linear shift-varying model. However, image reconstruction often assumes simplified shift-invariant models to the detriment of image quality and quantitative accuracy. We investigated a shift-varying model of the geometrical system response based on an analytical formulation. The model was incorporated within a list-mode, fully 3D iterative reconstruction process in which the system response coefficients are calculated online on a graphics processing unit (GPU). The implementation requires less than 512 Mb of GPU memory and can process two million events per minute (forward and backprojection). For small detector volume elements, the analytical model compared well to reference calculations. Images reconstructed with the shift-varying model achieved higher quality and quantitative accuracy than those that used a simpler shift-invariant model. For an 8 mm sphere in a warm background, the contrast recovery was 95.8% for the shift-varying model versus 85.9% for the shift-invariant model. In addition, the spatial resolution was more uniform across the field-of-view: for an array of 1.75 mm hot spheres in air, the variation in reconstructed sphere size was 0.5 mm RMS for the shift-invariant model, compared to 0.07 mm RMS for the shift-varying model.
Collapse
Affiliation(s)
- Guillem Pratx
- Department of Radiation Oncology, Stanford University, Stanford, CA 94305
| | - Craig Levin
- Departments of Radiology, Physics and Electrical Engineering, and Molecular Imaging Program at Stanford, Stanford University, Stanford, CA 94305
| |
Collapse
|
20
|
Hesterman JY, Caucci L, Kupinski MA, Barrett HH, Furenlid LR. Maximum-Likelihood Estimation With a Contracting-Grid Search Algorithm. IEEE Trans Nucl Sci 2010; 57:1077-1084. [PMID: 20824155 PMCID: PMC2932457 DOI: 10.1109/tns.2010.2045898] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
A fast search algorithm capable of operating in multi-dimensional spaces is introduced. As a sample application, we demonstrate its utility in the 2D and 3D maximum-likelihood position-estimation problem that arises in the processing of PMT signals to derive interaction locations in compact gamma cameras. We demonstrate that the algorithm can be parallelized in pipelines, and thereby efficiently implemented in specialized hardware, such as field-programmable gate arrays (FPGAs). A 2D implementation of the algorithm is achieved in Cell/BE processors, resulting in processing speeds above one million events per second, which is a 20× increase in speed over a conventional desktop machine. Graphics processing units (GPUs) are used for a 3D application of the algorithm, resulting in processing speeds of nearly 250,000 events per second which is a 250× increase in speed over a conventional desktop machine. These implementations indicate the viability of the algorithm for use in real-time imaging applications.
Collapse
Affiliation(s)
| | - Luca Caucci
- College of Optical Sciences and Department of Radiology, University of Arizona, Tucson, AZ 85724 USA
| | - Matthew A. Kupinski
- College of Optical Sciences and Department of Radiology, University of Arizona, Tucson, AZ 85724 USA
| | - Harrison H. Barrett
- College of Optical Sciences and Department of Radiology, University of Arizona, Tucson, AZ 85724 USA
| | - Lars R. Furenlid
- College of Optical Sciences and Department of Radiology, University of Arizona, Tucson, AZ 85724 USA
| |
Collapse
|
21
|
Abstract
List-mode processing provides an efficient way to deal with sparse projections in iterative image reconstruction for emission tomography. An issue often reported is the tremendous amount of computation required by such algorithm. Each recorded event requires several back- and forward line projections. We investigated the use of the programmable graphics processing unit (GPU) to accelerate the line-projection operations and implement fully-3D list-mode ordered-subsets expectation-maximization for positron emission tomography (PET). We designed a reconstruction approach that incorporates resolution kernels, which model the spatially-varying physical processes associated with photon emission, transport and detection. Our development is particularly suitable for applications where the projection data is sparse, such as high-resolution, dynamic, and time-of-flight PET reconstruction. The GPU approach runs more than 50 times faster than an equivalent CPU implementation while image quality and accuracy are virtually identical. This paper describes in details how the GPU can be used to accelerate the line projection operations, even when the lines-of-response have arbitrary endpoint locations and shift-varying resolution kernels are used. A quantitative evaluation is included to validate the correctness of this new approach.
Collapse
Affiliation(s)
- Guillem Pratx
- Department of Radiology, Molecular Imaging Program, Stanford University, Stanford, CA 94305 USA
| | - Garry Chinn
- Department of Radiology, Molecular Imaging Program, Stanford University, Stanford, CA 94305 USA
| | - Peter D. Olcott
- Department of Radiology, Molecular Imaging Program, Stanford University, Stanford, CA 94305 USA
| | | |
Collapse
|