1
|
Martin RL, Heifetz A, Bodkin MJ, Townsend-Nicholson A. High-Throughput Structure-Based Drug Design (HT-SBDD) Using Drug Docking, Fragment Molecular Orbital Calculations, and Molecular Dynamic Techniques. Methods Mol Biol 2024; 2716:293-306. [PMID: 37702945 DOI: 10.1007/978-1-0716-3449-3_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Structure-based drug design (SBDD) is rapidly evolving to be a fundamental tool for faster and more cost-effective methods of lead drug discovery. SBDD aims to offer a computational replacement to traditional high-throughput screening (HTS) methods of drug discovery. This "virtual screening" technique utilizes the structural data of a target protein in conjunction with large databases of potential drug candidates and then applies a range of different computational techniques to determine which potential candidates are likely to bind with high affinity and efficacy. It is proposed that high-throughput SBDD (HT-SBDD) will significantly enrich the success rate of HTS methods, which currently fluctuates around ~1%. In this chapter, we focus on the theory and utility of high-throughput drug docking, fragment molecular orbital calculations, and molecular dynamics techniques. We also offer a comparative review of the benefits and limitations of traditional methods against more recent SBDD advances. As HT-SBDD is computationally intensive, we will also cover the important role high-performance computing (HPC) clusters play in the future of computational drug discovery.
Collapse
Affiliation(s)
- Reuben L Martin
- Research Department of Structural & Molecular Biology, Division of Biosciences, University College London, London, UK.
- Evotec (UK) Ltd., Abingdon, Oxfordshire, UK.
| | | | | | - Andrea Townsend-Nicholson
- Research Department of Structural & Molecular Biology, Division of Biosciences, University College London, London, UK
| |
Collapse
|
2
|
Townsend-Nicholson A. Teaching Medical Students to Use Supercomputers: A Personal Reflection. Methods Mol Biol 2024; 2716:413-420. [PMID: 37702952 DOI: 10.1007/978-1-0716-3449-3_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
At the "Kick Off" meeting for CompBioMed (compbiomed.eu), which was first funded in October 2016, I had no idea that one single sentence ("I wish I could teach this to medical students") would lead to a dedicated program of work to engage the clinicians and biomedical researchers of the future with supercomputing. This program of work which, within the CompBiomed Centre of Excellence, we have been calling "the CompBioMed Education and Training Programme," is a holistic endeavor that has been developed by and continues to be delivered with the expertise and support from experimental researchers, computer scientists, clinicians, HPC centers, and industrial partners within or associated with CompBioMed. The original description of the initial educational approach to training has previously been published (Townsend-Nicholson Interface Focus 10:20200003, 2020). In this chapter, I describe the refinements to the program and its delivery, emphasizing the highs and lows of delivering this program over the past 6 years. I conclude with suggestions for feasible measures that I believe will help overcome the barriers and challenges we have encountered in bringing a community of users with little familiarity of computing beyond the desktop to the petascale and beyond.
Collapse
Affiliation(s)
- Andrea Townsend-Nicholson
- Research Department of Structural & Molecular Biology, Division of Biosciences, University College London, London, UK.
| |
Collapse
|
3
|
Aquilué-Llorens D, Goldman JS, Destexhe A. High-Density Exploration of Activity States in a Multi-Area Brain Model. Neuroinformatics 2024; 22:75-87. [PMID: 37981636 PMCID: PMC10917847 DOI: 10.1007/s12021-023-09647-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2023] [Indexed: 11/21/2023]
Abstract
To simulate whole brain dynamics with only a few equations, biophysical, mesoscopic models of local neuron populations can be connected using empirical tractography data. The development of mesoscopic mean-field models of neural populations, in particular, the Adaptive Exponential (AdEx mean-field model), has successfully summarized neuron-scale phenomena leading to the emergence of global brain dynamics associated with conscious (asynchronous and rapid dynamics) and unconscious (synchronized slow-waves, with Up-and-Down state dynamics) brain states, based on biophysical mechanisms operating at cellular scales (e.g. neuromodulatory regulation of spike-frequency adaptation during sleep-wake cycles or anesthetics). Using the Virtual Brain (TVB) environment to connect mean-field AdEx models, we have previously simulated the general properties of brain states, playing on spike-frequency adaptation, but have not yet performed detailed analyses of other parameters possibly also regulating transitions in brain-scale dynamics between different brain states. We performed a dense grid parameter exploration of the TVB-AdEx model, making use of High Performance Computing. We report a remarkable robustness of the effect of adaptation to induce synchronized slow-wave activity. Moreover, the occurrence of slow waves is often paralleled with a closer relation between functional and structural connectivity. We find that hyperpolarization can also generate unconscious-like synchronized Up and Down states, which may be a mechanism underlying the action of anesthetics. We conclude that the TVB-AdEx model reveals large-scale properties identified experimentally in sleep and anesthesia.
Collapse
Affiliation(s)
- David Aquilué-Llorens
- Paris-Saclay University, CNRS, Paris-Saclay Institute of Neuroscience (NeuroPSI), 91400, Saclay, France.
- Starlab Barcelona SL, Neuroscience BU, Av Tibidabo 47 bis, Barcelona, Spain.
| | - Jennifer S Goldman
- Paris-Saclay University, CNRS, Paris-Saclay Institute of Neuroscience (NeuroPSI), 91400, Saclay, France
| | - Alain Destexhe
- Paris-Saclay University, CNRS, Paris-Saclay Institute of Neuroscience (NeuroPSI), 91400, Saclay, France.
| |
Collapse
|
4
|
Verdicchio M, Teijeiro Barjas C. Introduction to High-Performance Computing. Methods Mol Biol 2024; 2716:15-29. [PMID: 37702934 DOI: 10.1007/978-1-0716-3449-3_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Since the first general-purpose computing machines came up in the middle of the twentieth century, computer science's popularity has been growing steadily until our time. The first computers represented a significant leap forward in automating calculations, so that several theoretical methods could be taken from paper into practice. The continuous need for increased computing capacity made computers evolve and become more and more powerful. Nowadays, high-performance computing (HPC) is a crucial component of scientific and technological advancement. This book chapter introduces the field of HPC, covering key concepts and essential terminology to understand this complex and rapidly evolving area. The chapter begins with an overview of what HPC is and how it differs from conventional computing. It then explores the various components and configurations of supercomputers, including shared memory systems, distributed memory systems, and hybrid systems and the different programming models used in HPC, including message passing, shared memory, and data parallelism. Finally, the chapter discusses significant challenges and future directions in supercomputing. Overall, this chapter provides a comprehensive introduction to the world of HPC and is an essential resource for anyone interested in this fascinating field.
Collapse
|
5
|
Pribec I, Hachinger S, Hayek M, Pringle GJ, Brüchle H, Jamitzky F, Mathias G. Efficient and Reliable Data Management for Biomedical Applications. Methods Mol Biol 2024; 2716:383-403. [PMID: 37702950 DOI: 10.1007/978-1-0716-3449-3_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
This chapter discusses the challenges and requirements of modern Research Data Management (RDM), particularly for biomedical applications in the context of high-performance computing (HPC). The FAIR data principles (Findable, Accessible, Interoperable, Reusable) are of special importance. Data formats, publication platforms, annotation schemata, automated data management and staging, the data infrastructure in HPC centers, file transfer and staging methods in HPC, and the EUDAT components are discussed. Tools and approaches for automated data movement and replication in cross-center workflows are explained, as well as the development of ontologies for structuring and quality-checking of metadata in computational biomedicine. The CompBioMed project is used as a real-world example of implementing these principles and tools in practice. The LEXIS project has built a workflow-execution and data management platform that follows the paradigm of HPC-Cloud convergence for demanding Big Data applications. It is used for orchestrating workflows with YORC, utilizing the data documentation initiative (DDI) and distributed computing resources (DCI). The platform is accessed by a user-friendly LEXIS portal for workflow and data management, making HPC and Cloud Computing significantly more accessible. Checkpointing, duplicate runs, and spare images of the data are used to create resilient workflows. The CompBioMed project is completing the implementation of such a workflow, using data replication and brokering, which will enable urgent computing on exascale platforms.
Collapse
Affiliation(s)
- Ivan Pribec
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ-BAdW), Munich, Germany
| | - Stephan Hachinger
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ-BAdW), Munich, Germany
| | - Mohamad Hayek
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ-BAdW), Munich, Germany
| | | | - Helmut Brüchle
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ-BAdW), Munich, Germany
| | - Ferdinand Jamitzky
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ-BAdW), Munich, Germany
| | - Gerald Mathias
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ-BAdW), Munich, Germany.
| |
Collapse
|
6
|
Peng K, Wammes JD, Nguyen A, Cătălin Iordan M, Norman KA, Turk-Browne NB. INDUCING REPRESENTATIONAL CHANGE IN THE HIPPOCAMPUS THROUGH REAL-TIME NEUROFEEDBACK. bioRxiv 2023:2023.12.01.569487. [PMID: 38106228 PMCID: PMC10723264 DOI: 10.1101/2023.12.01.569487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
When you perceive or remember one thing, other related things come to mind. This competition has consequences for how these items are later perceived, attended, or remembered. Such behavioral consequences result from changes in how much the neural representations of the items overlap, especially in the hippocampus. These changes can reflect increased (integration) or decreased (differentiation) overlap; previous studies have posited that the amount of coactivation between competing representations in cortex determines which will occur: high coactivation leads to hippocampal integration, medium coactivation leads to differentiation, and low coactivation is inert. However, those studies used indirect proxies for coactivation, by manipulating stimulus similarity or task demands. Here we induce coactivation of competing memories in visual cortex more directly using closed-loop neurofeedback from real-time fMRI. While viewing one object, participants were rewarded for implicitly activating the representation of another object as strongly as possible. Across multiple real-time fMRI training sessions, they succeeded in using the neurofeedback to induce coactivation. Compared with untrained objects, this coactivation led to behavioral and neural integration: The trained objects became harder for participants to discriminate in a categorical perception task and harder to decode from patterns of fMRI activity in the hippocampus.
Collapse
Affiliation(s)
- Kailong Peng
- Department of Psychology, Interdepartmental Neuroscience Program, Yale University
| | - Jeffrey D Wammes
- Department of Psychology, Centre for Neuroscience Studies, Queen's University
| | - Alex Nguyen
- Department of Psychology, Princeton Neuroscience Institute, Princeton University
| | - Marius Cătălin Iordan
- Department of Brain and Cognitive Sciences, Department of Neuroscience, University of Rochester
| | - Kenneth A Norman
- Department of Psychology, Princeton Neuroscience Institute, Princeton University
| | | |
Collapse
|
7
|
Oks D, Houzeaux G, Vázquez M, Neidlin M, Samaniego C. Effect of TAVR commissural alignment on coronary flow: A fluid-structure interaction analysis. Comput Methods Programs Biomed 2023; 242:107818. [PMID: 37837886 DOI: 10.1016/j.cmpb.2023.107818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 09/07/2023] [Accepted: 09/15/2023] [Indexed: 10/16/2023]
Abstract
BACKGROUND AND OBJECTIVES Coronary obstruction is a complication that may affect patients receiving Transcatheter Aortic Valve Replacement (TAVR), with catastrophic consequences and long-term negative effects. To enable healthy coronary perfusion, it is fundamental to appropriately position the device with respect to the coronary ostia. Nonetheless, most TAVR delivery systems do not control commissural alignment to do so. Moreover, no in silico study has directly assessed the effect of commissural alignment on coronary perfusion. This work aims to evaluate the effect of TAVR commissural alignment on coronary perfusion and device performance. METHODS A two-way computational fluid-structure interaction model is used to predict coronary perfusion at different commissural alignments. Moreover, in each scenario, hemodynamic biomarkers are evaluated to assess device performance. RESULTS Commissural misalignment is shown to reduce the total coronary perfusion by -3.2% and the flow rate to a single coronary branch by -6.8%. It is also observed to impair valvular function by reducing the systolic geometric orifice area by -2.5% and increasing the systolic transvalvular pressure gradients by +5.3% and the diastolic leaflet stresses by +16.0%. CONCLUSIONS The present TAVR patient model indicates that coronary perfusion, hemodynamic and structural performance are minimized when the prosthesis commissures are fully misaligned with the native ones. These results support the importance of enabling axial control in new TAVR delivery catheter systems and defining recommended values of commissural alignment in upcoming clinical treatment guidelines.
Collapse
Affiliation(s)
- David Oks
- Barcelona Supercomputing Center, Computer Applications in Science and Engineering, Plaça d'Eusebi Güell, 1-3, 08034, Barcelona, Spain; ELEM Biotech SL, Plaça Pau Vila, 1, Bloc A, Planta 3, Porta 3A1, 08003, Barcelona, Spain.
| | - Guillaume Houzeaux
- Barcelona Supercomputing Center, Computer Applications in Science and Engineering, Plaça d'Eusebi Güell, 1-3, 08034, Barcelona, Spain
| | - Mariano Vázquez
- Barcelona Supercomputing Center, Computer Applications in Science and Engineering, Plaça d'Eusebi Güell, 1-3, 08034, Barcelona, Spain; ELEM Biotech SL, Plaça Pau Vila, 1, Bloc A, Planta 3, Porta 3A1, 08003, Barcelona, Spain
| | - Michael Neidlin
- Department of Cardiovascular Engineering, Institute of Applied Medical Engineering, Medical Faculty, RWTH Aachen University, Pauwelstraße 20, 52074, Aachen, Germany
| | - Cristóbal Samaniego
- Barcelona Supercomputing Center, Computer Applications in Science and Engineering, Plaça d'Eusebi Güell, 1-3, 08034, Barcelona, Spain
| |
Collapse
|
8
|
Africa PC, Piersanti R, Regazzoni F, Bucelli M, Salvador M, Fedele M, Pagani S, Dede' L, Quarteroni A. lifex-ep: a robust and efficient software for cardiac electrophysiology simulations. BMC Bioinformatics 2023; 24:389. [PMID: 37828428 PMCID: PMC10571323 DOI: 10.1186/s12859-023-05513-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 10/02/2023] [Indexed: 10/14/2023] Open
Abstract
BACKGROUND Simulating the cardiac function requires the numerical solution of multi-physics and multi-scale mathematical models. This underscores the need for streamlined, accurate, and high-performance computational tools. Despite the dedicated endeavors of various research teams, comprehensive and user-friendly software programs for cardiac simulations, capable of accurately replicating both normal and pathological conditions, are still in the process of achieving full maturity within the scientific community. RESULTS This work introduces [Formula: see text]-ep, a publicly available software for numerical simulations of the electrophysiology activity of the cardiac muscle, under both normal and pathological conditions. [Formula: see text]-ep employs the monodomain equation to model the heart's electrical activity. It incorporates both phenomenological and second-generation ionic models. These models are discretized using the Finite Element method on tetrahedral or hexahedral meshes. Additionally, [Formula: see text]-ep integrates the generation of myocardial fibers based on Laplace-Dirichlet Rule-Based Methods, previously released in Africa et al., 2023, within [Formula: see text]-fiber. As an alternative, users can also choose to import myofibers from a file. This paper provides a concise overview of the mathematical models and numerical methods underlying [Formula: see text]-ep, along with comprehensive implementation details and instructions for users. [Formula: see text]-ep features exceptional parallel speedup, scaling efficiently when using up to thousands of cores, and its implementation has been verified against an established benchmark problem for computational electrophysiology. We showcase the key features of [Formula: see text]-ep through various idealized and realistic simulations conducted in both normal and pathological scenarios. Furthermore, the software offers a user-friendly and flexible interface, simplifying the setup of simulations using self-documenting parameter files. CONCLUSIONS [Formula: see text]-ep provides easy access to cardiac electrophysiology simulations for a wide user community. It offers a computational tool that integrates models and accurate methods for simulating cardiac electrophysiology within a high-performance framework, while maintaining a user-friendly interface. [Formula: see text]-ep represents a valuable tool for conducting in silico patient-specific simulations.
Collapse
Affiliation(s)
- Pasquale Claudio Africa
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
- mathLab, Mathematics Area, SISSA International School for Advanced Studies, Trieste, Italy
| | - Roberto Piersanti
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy.
| | | | - Michele Bucelli
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Matteo Salvador
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, California, USA
| | - Marco Fedele
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Stefano Pagani
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Luca Dede'
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Alfio Quarteroni
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
- Institute of Mathematics, École Polytechnique Fédérale de Lausanne, Lausanne, Professor emeritus, Switzerland
| |
Collapse
|
9
|
Kalmár P, Hegedűs F, Nagy D, Sándor L, Klapcsik K. Memory-friendly fixed-point iteration method for nonlinear surface mode oscillations of acoustically driven bubbles: from the perspective of high-performance GPU programming. Ultrason Sonochem 2023; 99:106546. [PMID: 37574642 PMCID: PMC10448217 DOI: 10.1016/j.ultsonch.2023.106546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/21/2023] [Accepted: 07/31/2023] [Indexed: 08/15/2023]
Abstract
A fixed-point iteration technique is presented to handle the implicit nature of the governing equations of nonlinear surface mode oscillations of acoustically excited microbubbles. The model is adopted from the theoretical work of Shaw [1], where the dynamics of the mean bubble radius and the surface modes are bi-directionally coupled via nonlinear terms. The model comprises a set of second-order ordinary differential equations. It extends the classic Keller-Miksis equation and the linearized dynamical equations for each surface mode. Only the implicit parts (containing the second derivatives) are reevaluated during the iteration process. The performance of the technique is tested at various parameter combinations. The majority of the test cases needs only a single reevaluation to achieve 10-9 error. Although the arithmetic operation count is higher than the Gauss elimination, due to its memory-friendly matrix-free nature, it is a viable alternative for high-performance GPU computations of massive parameter studies.
Collapse
Affiliation(s)
- Péter Kalmár
- Department of Hydrodynamic Systems, Faculty of Mechanical Engineering, Budapest University of Technology and Economics, Budapest, Hungary.
| | - Ferenc Hegedűs
- Department of Hydrodynamic Systems, Faculty of Mechanical Engineering, Budapest University of Technology and Economics, Budapest, Hungary.
| | - Dániel Nagy
- Department of Hydrodynamic Systems, Faculty of Mechanical Engineering, Budapest University of Technology and Economics, Budapest, Hungary.
| | - Levente Sándor
- Department of Hydrodynamic Systems, Faculty of Mechanical Engineering, Budapest University of Technology and Economics, Budapest, Hungary.
| | - Kálmán Klapcsik
- Department of Hydrodynamic Systems, Faculty of Mechanical Engineering, Budapest University of Technology and Economics, Budapest, Hungary.
| |
Collapse
|
10
|
Maksudov F, Kliuchnikov E, Marx KA, Purohit PK, Barsegov V. Mechanical fatigue testing in silico: Dynamic evolution of material properties of nanoscale biological particles. Acta Biomater 2023; 166:326-345. [PMID: 37142109 DOI: 10.1016/j.actbio.2023.04.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 04/01/2023] [Accepted: 04/26/2023] [Indexed: 05/06/2023]
Abstract
Biological particles have evolved to possess mechanical characteristics necessary to carry out their functions. We developed a computational approach to "fatigue testing in silico", in which constant-amplitude cyclic loading is applied to a particle to explore its mechanobiology. We used this approach to describe dynamic evolution of nanomaterial properties and low-cycle fatigue in the thin spherical encapsulin shell, thick spherical Cowpea Chlorotic Mottle Virus (CCMV) capsid, and thick cylindrical microtubule (MT) fragment over 20 cycles of deformation. Changing structures and force-deformation curves enabled us to describe their damage-dependent biomechanics (strength, deformability, stiffness), thermodynamics (released and dissipated energies, enthalpy, and entropy) and material properties (toughness). Thick CCMV and MT particles experience material fatigue due to slow recovery and damage accumulation over 3-5 loading cycles; thin encapsulin shells show little fatigue due to rapid remodeling and limited damage. The results obtained challenge the existing paradigm: damage in biological particles is partially reversible owing to particle's partial recovery; fatigue crack may or may not grow with each loading cycle and may heal; and particles adapt to deformation amplitude and frequency to minimize the energy dissipated. Using crack size to quantitate damage is problematic as several cracks might form simultaneously in a particle. Dynamic evolution of strength, deformability, and stiffness, can be predicted by analyzing the cycle number (N) dependent damage, [Formula: see text] , where α is a power law and Nf is fatigue life. Fatigue testing in silico can now be used to explore damage-induced changes in the material properties of other biological particles. STATEMENT OF SIGNIFICANCE: Biological particles possess mechanical characteristics necessary to perform their functions. We developed "fatigue testing in silico" approach, which employes Langevin Dynamics simulations of constant-amplitude cyclic loading of nanoscale biological particles, to explore dynamic evolution of the mechanical, energetic, and material properties of the thin and thick spherical particles of encapsulin and Cowpea Chlorotic Mottle Virus, and the microtubule filament fragment. Our study of damage growth and fatigue development challenge the existing paradigm. Damage in biological particles is partially reversible as fatigue crack might heal with each loading cycle. Particles adapt to deformation amplitude and frequency to minimize energy dissipation. The evolution of strength, deformability, and stiffness, can be accurately predicted by analyzing the damage growth in particle structure.
Collapse
Affiliation(s)
- Farkhad Maksudov
- Department of Chemistry, University of Massachusetts, Lowell, MA 01854, United States
| | - Evgenii Kliuchnikov
- Department of Chemistry, University of Massachusetts, Lowell, MA 01854, United States
| | - Kenneth A Marx
- Department of Chemistry, University of Massachusetts, Lowell, MA 01854, United States
| | - Prashant K Purohit
- Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania, PA, United States
| | - Valeri Barsegov
- Department of Chemistry, University of Massachusetts, Lowell, MA 01854, United States.
| |
Collapse
|
11
|
Lecoeur B, Barbone M, Gough J, Oelfke U, Luk W, Gaydadjiev G, Wetscherek A. Accelerating 4D image reconstruction for magnetic resonance-guided radiotherapy. Phys Imaging Radiat Oncol 2023; 27:100484. [PMID: 37664799 PMCID: PMC10474606 DOI: 10.1016/j.phro.2023.100484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 08/15/2023] [Accepted: 08/16/2023] [Indexed: 09/05/2023] Open
Abstract
Background and purpose Physiological motion impacts the dose delivered to tumours and vital organs in external beam radiotherapy and particularly in particle therapy. The excellent soft-tissue demarcation of 4D magnetic resonance imaging (4D-MRI) could inform on intra-fractional motion, but long image reconstruction times hinder its use in online treatment adaptation. Here we employ techniques from high-performance computing to reduce 4D-MRI reconstruction times below two minutes to facilitate their use in MR-guided radiotherapy. Material and methods Four patients with pancreatic adenocarcinoma were scanned with a radial stack-of-stars gradient echo sequence on a 1.5T MR-Linac. Fast parallelised open-source implementations of the extra-dimensional golden-angle radial sparse parallel algorithm were developed for central processing unit (CPU) and graphics processing unit (GPU) architectures. We assessed the impact of architecture, oversampling and respiratory binning strategy on 4D-MRI reconstruction time and compared images using the structural similarity (SSIM) index against a MATLAB reference implementation. Scaling and bottlenecks for the different architectures were studied using multi-GPU systems. Results All reconstructed 4D-MRI were identical to the reference implementation (SSIM > 0.99). Images reconstructed with overlapping respiratory bins were sharper at the cost of longer reconstruction times. The CPU + GPU implementation was over 17 times faster than the reference implementation, reconstructing images in 60 ± 1 s and hyper-scaled using multiple GPUs. Conclusion Respiratory-resolved 4D-MRI reconstruction times can be reduced using high-performance computing methods for online workflows in MR-guided radiotherapy with potential applications in particle therapy.
Collapse
Affiliation(s)
- Bastien Lecoeur
- Joint Department of Physics at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust, 15 Cotswold Rd, London SM2 5NG, United Kingdom
- Department of Computing, Imperial College London, Exhibition Rd, South Kensington, London SW7 2BX, United Kingdom
| | - Marco Barbone
- Joint Department of Physics at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust, 15 Cotswold Rd, London SM2 5NG, United Kingdom
- Department of Computing, Imperial College London, Exhibition Rd, South Kensington, London SW7 2BX, United Kingdom
| | - Jessica Gough
- Department of Radiotherapy at the Royal Marsden NHS Foundation Trust, Downs Rd, London SM2 5PT, United Kingdom
| | - Uwe Oelfke
- Joint Department of Physics at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust, 15 Cotswold Rd, London SM2 5NG, United Kingdom
| | - Wayne Luk
- Department of Computing, Imperial College London, Exhibition Rd, South Kensington, London SW7 2BX, United Kingdom
| | - Georgi Gaydadjiev
- Department of Computing, Imperial College London, Exhibition Rd, South Kensington, London SW7 2BX, United Kingdom
- Bernoulli Institute, University of Groningen, Nijenborgh 9, Groningen 9747 AG, The Netherlands
| | - Andreas Wetscherek
- Joint Department of Physics at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust, 15 Cotswold Rd, London SM2 5NG, United Kingdom
| |
Collapse
|
12
|
Serpelloni M, Arricca M, Ravelli C, Grillo E, Mitola S, Salvadori A. Mechanobiology of the relocation of proteins in advecting cells: in vitro experiments, multi-physics modeling, and simulations. Biomech Model Mechanobiol 2023:10.1007/s10237-023-01717-2. [PMID: 37067608 PMCID: PMC10366044 DOI: 10.1007/s10237-023-01717-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 03/29/2023] [Indexed: 04/18/2023]
Abstract
Cell motility-a cellular behavior of paramount relevance in embryonic development, immunological response, metastasis, or angiogenesis-demands a mechanical deformation of the cell membrane and influences the surface motion of molecules and their biochemical interactions. In this work, we develop a fully coupled multi-physics model able to capture and predict the protein flow on endothelial advecting plasma membranes. The model has been validated against co-designed in vitro experiments. The complete picture of the receptor dynamics has been understood, and limiting factors have been identified together with the laws that regulate receptor polarization. This computational approach might be insightful in the prediction of endothelial cell behavior in different tumoral environments, circumventing the time-consuming and expensive empirical characterization of each tumor.
Collapse
Affiliation(s)
- M Serpelloni
- The Mechanobiology research center, UNIBS, 25123, Brescia, Italy
- Department of Mechanical and Industrial Engineering, Università degli Studi di Brescia, 25123, Brescia, Italy
| | - M Arricca
- The Mechanobiology research center, UNIBS, 25123, Brescia, Italy
- Department of Mechanical and Industrial Engineering, Università degli Studi di Brescia, 25123, Brescia, Italy
| | - C Ravelli
- The Mechanobiology research center, UNIBS, 25123, Brescia, Italy
- Department of Molecular and Translational Medicine, Università degli Studi di Brescia, 25123, Brescia, Italy
| | - E Grillo
- The Mechanobiology research center, UNIBS, 25123, Brescia, Italy
- Department of Molecular and Translational Medicine, Università degli Studi di Brescia, 25123, Brescia, Italy
| | - S Mitola
- The Mechanobiology research center, UNIBS, 25123, Brescia, Italy
- Department of Molecular and Translational Medicine, Università degli Studi di Brescia, 25123, Brescia, Italy
| | - A Salvadori
- The Mechanobiology research center, UNIBS, 25123, Brescia, Italy.
- Department of Mechanical and Industrial Engineering, Università degli Studi di Brescia, 25123, Brescia, Italy.
| |
Collapse
|
13
|
Africa PC, Piersanti R, Fedele M, Dede' L, Quarteroni A. lifex-fiber: an open tool for myofibers generation in cardiac computational models. BMC Bioinformatics 2023; 24:143. [PMID: 37046208 PMCID: PMC10091584 DOI: 10.1186/s12859-023-05260-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 03/27/2023] [Indexed: 04/14/2023] Open
Abstract
BACKGROUND Modeling the whole cardiac function involves the solution of several complex multi-physics and multi-scale models that are highly computationally demanding, which call for simpler yet accurate, high-performance computational tools. Despite the efforts made by several research groups, no software for whole-heart fully-coupled cardiac simulations in the scientific community has reached full maturity yet. RESULTS In this work we present [Formula: see text]-fiber, an innovative tool for the generation of myocardial fibers based on Laplace-Dirichlet Rule-Based Methods, which are the essential building blocks for modeling the electrophysiological, mechanical and electromechanical cardiac function, from single-chamber to whole-heart simulations. [Formula: see text]-fiber is the first publicly released module for cardiac simulations based on [Formula: see text], an open-source, high-performance Finite Element solver for multi-physics, multi-scale and multi-domain problems developed in the framework of the iHEART project, which aims at making in silico experiments easily reproducible and accessible to a wide community of users, including those with a background in medicine or bio-engineering. CONCLUSIONS The tool presented in this document is intended to provide the scientific community with a computational tool that incorporates general state of the art models and solvers for simulating the cardiac function within a high-performance framework that exposes a user- and developer-friendly interface. This report comes with an extensive technical and mathematical documentation to welcome new users to the core structure of [Formula: see text]-fiber and to provide them with a possible approach to include the generated cardiac fibers into more sophisticated computational pipelines. In the near future, more modules will be successively published either as pre-compiled binaries for x86-64 Linux systems or as open source software.
Collapse
Affiliation(s)
| | - Roberto Piersanti
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Marco Fedele
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Luca Dede'
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
| | - Alfio Quarteroni
- MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy
- Institute of Mathematics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
14
|
Fallon TR, Čalounová T, Mokrejš M, Weng JK, Pluskal T. transXpress: a Snakemake pipeline for streamlined de novo transcriptome assembly and annotation. BMC Bioinformatics 2023; 24:133. [PMID: 37016291 PMCID: PMC10074830 DOI: 10.1186/s12859-023-05254-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 03/24/2023] [Indexed: 04/06/2023] Open
Abstract
BACKGROUND RNA-seq followed by de novo transcriptome assembly has been a transformative technique in biological research of non-model organisms, but the computational processing of RNA-seq data entails many different software tools. The complexity of these de novo transcriptomics workflows therefore presents a major barrier for researchers to adopt best-practice methods and up-to-date versions of software. RESULTS Here we present a streamlined and universal de novo transcriptome assembly and annotation pipeline, transXpress, implemented in Snakemake. transXpress supports two popular assembly programs, Trinity and rnaSPAdes, and allows parallel execution on heterogeneous cluster computing hardware. CONCLUSIONS transXpress simplifies the use of best-practice methods and up-to-date software for de novo transcriptome assembly, and produces standardized output files that can be mined using SequenceServer to facilitate rapid discovery of new genes and proteins in non-model organisms.
Collapse
Affiliation(s)
- Timothy R Fallon
- Scripps Institution of Oceanography, UC San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Tereza Čalounová
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, 16000, Prague 6, Czech Republic
| | - Martin Mokrejš
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, 16000, Prague 6, Czech Republic
| | - Jing-Ke Weng
- Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA, 02142, USA.
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Tomáš Pluskal
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, 16000, Prague 6, Czech Republic.
| |
Collapse
|
15
|
Tan M, Xu J, Liu S, Feng J, Zhang H, Yao C, Chen S, Guo H, Han G, Wen Z, Chen B, He Y, Zheng X, Ming D, Tu Y, Fu Q, Qi N, Li D, Geng L, Wen S, Yang F, He H, Liu F, Xue H, Wang Y, Qiu C, Mi G, Li Y, Chang T, Lai M, Zhang L, Hao Q, Qin M. Co-packaged optics (CPO): status, challenges, and solutions. Front Optoelectron 2023; 16:1. [PMID: 36939942 PMCID: PMC10027985 DOI: 10.1007/s12200-022-00055-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/22/2022] [Indexed: 06/18/2023]
Abstract
Due to the rise of 5G, IoT, AI, and high-performance computing applications, datacenter traffic has grown at a compound annual growth rate of nearly 30%. Furthermore, nearly three-fourths of the datacenter traffic resides within datacenters. The conventional pluggable optics increases at a much slower rate than that of datacenter traffic. The gap between application requirements and the capability of conventional pluggable optics keeps increasing, a trend that is unsustainable. Co-packaged optics (CPO) is a disruptive approach to increasing the interconnecting bandwidth density and energy efficiency by dramatically shortening the electrical link length through advanced packaging and co-optimization of electronics and photonics. CPO is widely regarded as a promising solution for future datacenter interconnections, and silicon platform is the most promising platform for large-scale integration. Leading international companies (e.g., Intel, Broadcom and IBM) have heavily investigated in CPO technology, an inter-disciplinary research field that involves photonic devices, integrated circuits design, packaging, photonic device modeling, electronic-photonic co-simulation, applications, and standardization. This review aims to provide the readers a comprehensive overview of the state-of-the-art progress of CPO in silicon platform, identify the key challenges, and point out the potential solutions, hoping to encourage collaboration between different research fields to accelerate the development of CPO technology.
Collapse
Affiliation(s)
- Min Tan
- School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China.
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Jiang Xu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
- HKUST Fok Ying Tung Research Institute, Guangzhou, 511462, China.
- The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511462, China.
| | - Siyang Liu
- Chongqing United Micro-Electronics Center (CUMEC), Chongqing, 401332, China
| | - Junbo Feng
- Chongqing United Micro-Electronics Center (CUMEC), Chongqing, 401332, China.
| | - Hua Zhang
- Hisense Broadband Multimedia Technologies Co., Ltd., Qingdao, 266000, China.
| | - Chaonan Yao
- Hisense Broadband Multimedia Technologies Co., Ltd., Qingdao, 266000, China
| | - Shixi Chen
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Hangyu Guo
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
| | - Gengshi Han
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
| | - Zhanhao Wen
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
| | - Bao Chen
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
| | - Yu He
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
| | - Xuqiang Zheng
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China.
| | - Da Ming
- School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Yaowen Tu
- School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Qiang Fu
- School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Nan Qi
- State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
| | - Dan Li
- School of Microelectronics, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Li Geng
- School of Microelectronics, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Song Wen
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
| | - Fenghe Yang
- Zhangjiang Laboratory, Shanghai, 201210, China
| | - Huimin He
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
| | - Fengman Liu
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China
| | - Haiyun Xue
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029, China.
| | - Yuhang Wang
- School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Ciyuan Qiu
- The State Key Laboratory of Advanced Optical Communication Systems and Networks, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Guangcan Mi
- Huawei Technologies Co., Ltd., Shenzhen, 440307, China
| | - Yanbo Li
- Huawei Technologies Co., Ltd., Shenzhen, 440307, China
| | - Tianhai Chang
- Huawei Technologies Co., Ltd., Shenzhen, 440307, China.
| | - Mingche Lai
- College of Computer, National University of Defense Technology, Changsha, 410073, China
| | - Luo Zhang
- College of Computer, National University of Defense Technology, Changsha, 410073, China.
| | - Qinfen Hao
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100086, China.
| | - Mengyuan Qin
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100086, China
| |
Collapse
|
16
|
García JS, Puertas-Martín S, Redondo JL, Moreno JJ, Ortigosa PM. Improving drug discovery through parallelism. J Supercomput 2023; 79:9538-9557. [PMID: 36687309 PMCID: PMC9842220 DOI: 10.1007/s11227-022-05014-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 12/14/2022] [Indexed: 06/17/2023]
Abstract
Compound identification in ligand-based virtual screening is limited by two key issues: the quality and the time needed to obtain predictions. In this sense, we designed OptiPharm, an algorithm that obtained excellent results in improving the sequential methods in the literature. In this work, we go a step further and propose its parallelization. Specifically, we propose a two-layer parallelization. Firstly, an automation of the molecule distribution process between the available nodes in a cluster, and secondly, a parallelization of the internal methods (initialization, reproduction, selection and optimization). This new software, called pOptiPharm, aims to improve the quality of predictions and reduce experimentation time. As the results show, the performance of the proposed methods is good. It can find better solutions than the sequential OptiPharm, all while reducing its computation time almost proportionally to the number of processing units considered.
Collapse
Affiliation(s)
- Jerónimo S. García
- Supercomputing - Algorithms Research Group (SAL), Agrifood Campus of International Excellence, University of Almería, Carretera Sacramento s/n, La Cañada de San Urbano, 04120 Almería, Spain
| | - Savíns Puertas-Martín
- Supercomputing - Algorithms Research Group (SAL), Agrifood Campus of International Excellence, University of Almería, Carretera Sacramento s/n, La Cañada de San Urbano, 04120 Almería, Spain
- Information School, University of Sheffield, 221, Portobello Street, Sheffield, S1 4DP United Kingdom
| | - Juana L. Redondo
- Supercomputing - Algorithms Research Group (SAL), Agrifood Campus of International Excellence, University of Almería, Carretera Sacramento s/n, La Cañada de San Urbano, 04120 Almería, Spain
| | - Juan José Moreno
- Supercomputing - Algorithms Research Group (SAL), Agrifood Campus of International Excellence, University of Almería, Carretera Sacramento s/n, La Cañada de San Urbano, 04120 Almería, Spain
| | - Pilar M. Ortigosa
- Supercomputing - Algorithms Research Group (SAL), Agrifood Campus of International Excellence, University of Almería, Carretera Sacramento s/n, La Cañada de San Urbano, 04120 Almería, Spain
| |
Collapse
|
17
|
Ravikumar A, Sriraman H, Sai Saketh PM, Lokesh S, Karanam A. Effect of neural network structure in accelerating performance and accuracy of a convolutional neural network with GPU/TPU for image analytics. PeerJ Comput Sci 2022; 8:e909. [PMID: 35494877 PMCID: PMC9044238 DOI: 10.7717/peerj-cs.909] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 02/09/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND In deep learning the most significant breakthrough in the field of image recognition, object detection language processing was done by Convolutional Neural Network (CNN). Rapid growth in data and neural networks the performance of the DNN algorithms depends on the computation power and the storage capacity of the devices. METHODS In this paper, the convolutional neural network used for various image applications was studied and its acceleration in the various platforms like CPU, GPU, TPU was done. The neural network structure and the computing power and characteristics of the GPU, TPU was analyzed and summarized, the effect of these on accelerating the tasks is also explained. Cross-platform comparison of the CNN was done using three image applications the face mask detection (object detection/Computer Vision), Virus Detection in Plants (Image Classification: agriculture sector), and Pneumonia detection from X-ray Images (Image Classification/medical field). RESULTS The CNN implementation was done and a comprehensive comparison was done on the platforms to identify the performance, throughput, bottlenecks, and training time. The CNN layer-wise execution in GPU and TPU is explained with layer-wise analysis. The impact of the fully connected layer and convolutional layer on the network is analyzed. The challenges faced during the acceleration process were discussed and future works are identified.
Collapse
Affiliation(s)
- Aswathy Ravikumar
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| | - Harini Sriraman
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| | - P. Maruthi Sai Saketh
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| | - Saddikuti Lokesh
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| | - Abhiram Karanam
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| |
Collapse
|
18
|
Christley S, Stervbo U, Cowell LG. Immune Repertoire Analysis on High-Performance Computing Using VDJServer V1: A Method by the AIRR Community. Methods Mol Biol 2022; 2453:439-446. [PMID: 35622338 PMCID: PMC9761903 DOI: 10.1007/978-1-0716-2115-8_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
AIRR-seq data sets are usually large and require specialized analysis methods and software tools. A typical Illumina MiSeq sequencing run generates 20-30 million 2 × 300 bp paired-end sequence reads, which roughly corresponds to 15 GB of sequence data to be processed. Other platforms like NextSeq, which is useful in projects where the full V gene is not needed, create about 400 million 2 × 150 bp paired-end reads. Because of the size of the data sets, the analysis can be computationally expensive, particularly the early analysis steps like preprocessing and gene annotation that process the majority of the sequence data. A standard desktop PC may take 3-5 days of constant processing for a single MiSeq run, so dedicated high-performance computational resources may be required.VDJServer provides free access to high-performance computing (HPC) at the Texas Advanced Computing Center (TACC) through a graphical user interface (Christley et al. Front Immunol 9:976, 2018). VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provides access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene assignment, repertoire characterization, and repertoire comparison. Furthermore, VDJServer has parallelized execution for tools such as IgBLAST, so more compute resources are utilized as the size of the input data grows. Analysis that takes days on a desktop PC might take only a few hours on VDJServer. VDJServer is a free, publicly available, and open-source licensed resource. Here, we describe the workflow for performing immune repertoire analysis on VDJServer's high-performance computing.
Collapse
Affiliation(s)
- Scott Christley
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA
| | - Ulrik Stervbo
- Center for Translational Medicine, Immunology, and Transplantation, Immundiagnostik, Marien Hospital Herne, University Hospital of the Ruhr-University Bochum, Herne, Germany
| | - Lindsay G Cowell
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX, USA.
- Department of Immunology, UT Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
19
|
Waldmann M, Grosch A, Witzler C, Lehner M, Benda O, Koch W, Vogt K, Kohn C, Schröder W, Göbbert JH, Lintermann A. An effective simulation- and measurement-based workflow for enhanced diagnostics in rhinology. Med Biol Eng Comput 2021. [PMID: 34950998 DOI: 10.1007/s11517-021-02446-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 08/20/2021] [Indexed: 11/02/2022]
Abstract
Physics-based analyses have the potential to consolidate and substantiate medical diagnoses in rhinology. Such methods are frequently subject to intense investigations in research. However, they are not used in clinical applications, yet. One issue preventing their direct integration is that these methods are commonly developed as isolated solutions which do not consider the whole chain of data processing from initial medical to higher valued data. This manuscript presents a workflow that incorporates the whole data processing pipeline based on a Jupyter environment. Therefore, medical image data are fully automatically pre-processed by machine learning algorithms. The resulting geometries employed for the simulations on high-performance computing systems reach an accuracy of up to 99.5% compared to manually segmented geometries. Additionally, the user is enabled to upload and visualize 4-phase rhinomanometry data. Subsequent analysis and visualization of the simulation outcome extend the results of standardized diagnostic methods by a physically sound interpretation. Along with a detailed presentation of the methodologies, the capabilities of the workflow are demonstrated by evaluating an exemplary medical case. The pipeline output is compared to 4-phase rhinomanometry data. The comparison underlines the functionality of the pipeline. However, it also illustrates the influence of mucosa swelling on the simulation. Graphical Abstract Workflow for enhanced diagnostics in rhinology.
Collapse
|
20
|
Delmelle EM, Desjardins MR, Jung P, Owusu C, Lan Y, Hohl A, Dony C. Uncertainty in geospatial health: challenges and opportunities ahead. Ann Epidemiol 2021; 65:15-30. [PMID: 34656750 DOI: 10.1016/j.annepidem.2021.10.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 09/29/2021] [Accepted: 10/04/2021] [Indexed: 12/19/2022]
Abstract
PURPOSE Uncertainty is not always well captured, understood, or modeled properly, and can bias the robustness of complex relationships, such as the association between the environment and public health through exposure, estimates of geographic accessibility and cluster detection, to name a few. METHODS We review current challenges and future opportunities as geospatial data and analyses are applied to the field of public health. We are particularly interested in the sources of uncertainty in geospatial data and how this uncertainty may propagate in spatial analysis. RESULTS We present opportunities to reduce the magnitude and impact of uncertainty. Specifically, we focus on (1) the use of multiple reference data sources to reduce geocoding errors, (2) the validity of online geocoders and how confidentiality (e.g., HIPAA) may be breached, (3) use of multiple reference data sources to reduce geocoding errors, (4) the impact of geoimputation techniques on travel estimates, (5) residential mobility and how it affects accessibility metrics and clustering, and (6) modeling errors in the American Community Survey. Our paper discusses how to communicate spatial and spatiotemporal uncertainty, and high-performance computing to conduct large amounts of simulations to ultimately increase statistical robustness for studies in public health. CONCLUSIONS Our paper contributes to recent efforts to fill in knowledge gaps at the intersection of spatial uncertainty and public health.
Collapse
|
21
|
Lueder MR, Cer RZ, Patrick M, Voegtly LJ, Long KA, Rice GK, Bishop-Lilly KA. Manual Annotation Studio (MAS): a collaborative platform for manual functional annotation of viral and microbial genomes. BMC Genomics 2021; 22:733. [PMID: 34627149 DOI: 10.1186/s12864-021-08029-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 09/22/2021] [Indexed: 11/10/2022] Open
Abstract
Background Functional genome annotation is the process of labelling functional genomic regions with descriptive information. Manual curation can produce higher quality genome annotations than fully automated methods. Manual annotation efforts are time-consuming and complex; however, software can help reduce these drawbacks. Results We created Manual Annotation Studio (MAS) to improve the efficiency of the process of manual functional annotation prokaryotic and viral genomes. MAS allows users to upload unannotated genomes, provides an interface to edit and upload annotations, tracks annotation history and progress, and saves data to a relational database. MAS provides users with pertinent information through a simple point and click interface to execute and visualize results for multiple homology search tools (blastp, rpsblast, and HHsearch) against multiple databases (Swiss-Prot, nr, CDD, PDB, and an internally generated database). MAS was designed to accept connections over the local area network (LAN) of a lab or organization so multiple users can access it simultaneously. MAS can take advantage of high-performance computing (HPC) clusters by interfacing with SGE or SLURM and data can be exported from MAS in a variety of formats (FASTA, GenBank, GFF, and excel). Conclusions MAS streamlines and provides structure to manual functional annotation projects. MAS enhances the ability of users to generate, interpret, and compare results from multiple tools. The structure that MAS provides can improve project organization and reduce annotation errors. MAS is ideal for team-based annotation projects because it facilitates collaboration. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08029-8.
Collapse
|
22
|
Wilson E, Vant J, Layton J, Boyd R, Lee H, Turilli M, Hernández B, Wilkinson S, Jha S, Gupta C, Sarkar D, Singharoy A. Large-Scale Molecular Dynamics Simulations of Cellular Compartments. Methods Mol Biol 2021; 2302:335-356. [PMID: 33877636 DOI: 10.1007/978-1-0716-1394-8_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Molecular dynamics or MD simulation is gradually maturing into a tool for constructing in vivo models of living cells in atomistic details. The feasibility of such models is bolstered by integrating the simulations with data from microscopic, tomographic and spectroscopic experiments on exascale supercomputers, facilitated by the use of deep learning technologies. Over time, MD simulation has evolved from tens of thousands of atoms to over 100 million atoms comprising an entire cell organelle, a photosynthetic chromatophore vesicle from a purple bacterium. In this chapter, we present a step-by-step outline for preparing, executing and analyzing such large-scale MD simulations of biological systems that are essential to life processes. All scripts are provided via GitHub.
Collapse
Affiliation(s)
- Eric Wilson
- The School of Molecular Sciences, Arizona State University, Tempe, AZ, USA
| | - John Vant
- The School of Molecular Sciences, Arizona State University, Tempe, AZ, USA
| | - Jacob Layton
- The School of Molecular Sciences, Arizona State University, Tempe, AZ, USA
| | - Ryan Boyd
- The School of Molecular Sciences, Arizona State University, Tempe, AZ, USA
| | - Hyungro Lee
- RADICAL, ECE, Rutgers University, Piscataway, NJ, USA
| | | | | | | | - Shantenu Jha
- RADICAL, ECE, Rutgers University, Piscataway, NJ, USA.,Brookhaven National Laboratory, Upton, NY, USA
| | - Chitrak Gupta
- The School of Molecular Sciences, Arizona State University, Tempe, AZ, USA.
| | - Daipayan Sarkar
- The School of Molecular Sciences, Arizona State University, Tempe, AZ, USA. .,Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Abhishek Singharoy
- The School of Molecular Sciences, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
23
|
Atzori M, Köpp W, Chien SWD, Massaro D, Mallor F, Peplinski A, Rezaei M, Jansson N, Markidis S, Vinuesa R, Laure E, Schlatter P, Weinkauf T. In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst. J Supercomput 2021; 78:3605-3620. [PMID: 35210696 PMCID: PMC8827385 DOI: 10.1007/s11227-021-03990-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 07/07/2021] [Indexed: 06/14/2023]
Abstract
In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH's Beskow Cray XC40 supercomputer and assess in situ visualization's impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 % ). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.
Collapse
Affiliation(s)
- Marco Atzori
- SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Wiebke Köpp
- Division of Computational Science and Technology, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Steven W. D. Chien
- Division of Computational Science and Technology, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Daniele Massaro
- SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Fermín Mallor
- SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Adam Peplinski
- SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Mohamad Rezaei
- PDC Center for High Performance Computing, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Niclas Jansson
- PDC Center for High Performance Computing, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Stefano Markidis
- Division of Computational Science and Technology, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Ricardo Vinuesa
- SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Erwin Laure
- PDC Center for High Performance Computing, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Philipp Schlatter
- SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| | - Tino Weinkauf
- Division of Computational Science and Technology, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| |
Collapse
|
24
|
Zhao L, Batta I, Matloff W, O'Driscoll C, Hobel S, Toga AW. Neuroimaging PheWAS (Phenome-Wide Association Study): A Free Cloud-Computing Platform for Big-Data, Brain-Wide Imaging Association Studies. Neuroinformatics 2021; 19:285-303. [PMID: 32822005 DOI: 10.1007/s12021-020-09486-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Large-scale, case-control genome-wide association studies (GWASs) have revealed genetic variations associated with diverse neurological and psychiatric disorders. Recent advances in neuroimaging and genomic databases of large healthy and diseased cohorts have empowered studies to characterize effects of the discovered genetic factors on brain structure and function, implicating neural pathways and genetic mechanisms in the underlying biology. However, the unprecedented scale and complexity of the imaging and genomic data requires new advanced biomedical data science tools to manage, process and analyze the data. In this work, we introduce Neuroimaging PheWAS (phenome-wide association study): a web-based system for searching over a wide variety of brain-wide imaging phenotypes to discover true system-level gene-brain relationships using a unified genotype-to-phenotype strategy. This design features a user-friendly graphical user interface (GUI) for anonymous data uploading, study definition and management, and interactive result visualizations as well as a cloud-based computational infrastructure and multiple state-of-art methods for statistical association analysis and multiple comparison correction. We demonstrated the potential of Neuroimaging PheWAS with a case study analyzing the influences of the apolipoprotein E (APOE) gene on various brain morphological properties across the brain in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Benchmark tests were performed to evaluate the system's performance using data from UK Biobank. The Neuroimaging PheWAS system is freely available. It simplifies the execution of PheWAS on neuroimaging data and provides an opportunity for imaging genetics studies to elucidate routes at play for specific genetic variants on diseases in the context of detailed imaging phenotypic data.
Collapse
Affiliation(s)
- Lu Zhao
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Ishaan Batta
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - William Matloff
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Caroline O'Driscoll
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Samuel Hobel
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Arthur W Toga
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
25
|
Gerlero GS, Márquez Damián S, Schaumburg F, Franck N, Kler PA. Numerical simulations of paper-based electrophoretic separations with open-source tools. Electrophoresis 2021; 42:1543-1551. [PMID: 33991437 DOI: 10.1002/elps.202000315] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 05/10/2021] [Accepted: 05/11/2021] [Indexed: 11/06/2022]
Abstract
A new tool for the solution of electromigrative separations in paper-based microfluidics devices is presented. The implementation is based on a recently published complete mathematical model for describing these types of separations, and was developed on top of the open-source toolbox electroMicroTransport, based on OpenFOAM® , inheriting all its features as native 3D problem handling, support for parallel computation, and a GNU GPL license. The presented tool includes full support for paper-based electromigrative separations (including EOF and the novel mechanical and electrical dispersion effects), compatibility with a well-recognized electrolyte database, and a novel algorithm for computing and controlling the electric current in arbitrary geometries. Additionally, the installation on any operating system is available due to its novel installation option in the form of a Docker image. A validation example with data from literature is included, and two extra application examples are provided, including a 2D free-flow IEF problem, which demonstrates the capabilities of the toolbox for dealing with computational and physicochemical modeling challenges simultaneously. This tool will enable efficient and reliable numerical prototypes of paper-based electrophoretic devices to accompany the contemporary fast growth in paper-based microfluidics.
Collapse
Affiliation(s)
- Gabriel S Gerlero
- Centro de Investigación en Métodos Computacionales (CIMEC), Universidad Nacional del Litoral-CONICET, Santa Fe, Argentina
| | - Santiago Márquez Damián
- Centro de Investigación en Métodos Computacionales (CIMEC), Universidad Nacional del Litoral-CONICET, Santa Fe, Argentina.,Departamento de Ingeniería Mecánica, FRSF-UTN, Santa Fe, Argentina
| | - Federico Schaumburg
- Instituto de Desarrollo Tecnológico para la Industria Química (INTEC), Universidad Nacional del Litoral - CONICET, Santa Fe, Argentina
| | - Nicolás Franck
- Centro de Investigación en Métodos Computacionales (CIMEC), Universidad Nacional del Litoral-CONICET, Santa Fe, Argentina
| | - Pablo A Kler
- Centro de Investigación en Métodos Computacionales (CIMEC), Universidad Nacional del Litoral-CONICET, Santa Fe, Argentina.,Departamento de Ingeniería en Sistemas de Información, FRSF-UTN, Santa Fe, Argentina
| |
Collapse
|
26
|
Pavlovikj N, Gomes-Neto JC, Deogun JS, Benson AK. ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses. PeerJ 2021; 9:e11376. [PMID: 34055480 PMCID: PMC8142932 DOI: 10.7717/peerj.11376] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 04/08/2021] [Indexed: 12/28/2022] Open
Abstract
Whole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Flexibility, scalability, and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, reproducible, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: (1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; (2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; (3) Use of high-performance and high-throughput computational platforms; (4) Generation of hierarchical-based population structure analysis based on combinations of multi-locus and Bayesian statistical approaches for classification for ecological and epidemiological inquiries; (5) Association of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases with the hierarchically-related genotypic classifications; and (6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis such as identification of population-specific genomic signatures. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species, and the Pegasus WMS uniquely facilitates addition or removal of programs from the workflow or modification of options within them. To demonstrate versatility of the ProkEvo platform, we performed a hierarchical-based population structure analyses from available genomes of three distinct pathogenic bacterial species as individual case studies. The specific case studies illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be integrated into an analysis. Collectively, our study shows that ProkEvo presents a practical viable option for scalable, automated analyses of bacterial populations with direct applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.
Collapse
Affiliation(s)
- Natasha Pavlovikj
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Joao Carlos Gomes-Neto
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.,Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Jitender S Deogun
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Andrew K Benson
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.,Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| |
Collapse
|
27
|
Klippel H, Süssmaier S, Röthlin M, Afrasiabi M, Pala U, Wegener K. Simulation of the ductile machining mode of silicon. Int J Adv Manuf Technol 2021; 115:1565-1578. [PMID: 34776579 PMCID: PMC8550667 DOI: 10.1007/s00170-021-07167-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 04/27/2021] [Indexed: 06/13/2023]
Abstract
Diamond wire sawing has been developed to reduce the cutting loss when cutting silicon wafers from ingots. The surface of silicon solar cells must be flawless in order to achieve the highest possible efficiency. However, the surface is damaged during sawing. The extent of the damage depends primarily on the material removal mode. Under certain conditions, the generally brittle material can be machined in ductile mode, whereby considerably fewer cracks occur in the surface than with brittle material removal. In the presented paper, a numerical model is developed in order to support the optimisation of the machining process regarding the transition between ductile and brittle material removal. The simulations are performed with an GPU-accelerated in-house developed code using mesh-free methods which easily handle large deformations while classic methods like FEM would require intensive remeshing. The Johnson-Cook flow stress model is implemented and used to evaluate the applicability of a model for ductile material behaviour in the transition zone between ductile and brittle removal mode. The simulation results are compared with results obtained from single grain scratch experiments using a real, non-idealised grain geometry as present in the diamond wire sawing process.
Collapse
Affiliation(s)
- Hagen Klippel
- Department of Mechanical Engineering, Institute of Machine Tools and Manufacturing (IWF), ETH Zürich, Zürich, Switzerland
| | - Stefan Süssmaier
- Department of Mechanical Engineering, Institute of Machine Tools and Manufacturing (IWF), ETH Zürich, Zürich, Switzerland
| | - Matthias Röthlin
- Operation Center 1 at Federal Office of Meteorology & Climatology, MeteoSwiss, Switzerland
| | | | | | - Konrad Wegener
- Department of Mechanical Engineering, Institute of Machine Tools and Manufacturing (IWF), ETH Zürich, Zürich, Switzerland
| |
Collapse
|
28
|
Cortés U, Cortés A, Garcia-Gasulla D, Pérez-Arnal R, Álvarez-Napagao S, Àlvarez E. The ethical use of high-performance computing and artificial intelligence: fighting COVID-19 at Barcelona Supercomputing Center. AI Ethics 2021; 2:325-340. [PMID: 34790948 PMCID: PMC8101339 DOI: 10.1007/s43681-021-00056-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 04/15/2021] [Indexed: 10/24/2022]
Abstract
The COVID-19 pandemic has created an extraordinary medical, economic and humanitarian emergency. Artificial intelligence, in combination with other digital technologies, is being used as a tool to support the fight against the viral pandemic that has affected the entire world since the beginning of 2020. Barcelona Supercomputing Center collaborates in the battle against the coronavirus in different areas: the application of bioinformatics for the research on the virus and its possible treatments, the use of artificial intelligence, natural language processing and big data techniques to analyse the spread and impact of the pandemic, and the use of the MareNostrum 4 supercomputer to enable massive analysis on COVID-19 data. Many of these activities have included the use of personal and sensitive data of citizens, which, even during a pandemic, should be treated and handled with care. In this work we discuss our approach based on an ethical, transparent and fair use of this information, an approach aligned with the guidelines proposed by the European Union.
Collapse
Affiliation(s)
- Ulises Cortés
- Universitat Politècnica de Catalunya, Edifici Omega 205, Jordi Girona 29, 08034 Barcelona, Spain.,Barcelona Supercomputing Center, Edifici Omega 201, Jordi Girona 1 and 3, 08034 Barcelona, Spain
| | - Atia Cortés
- Barcelona Supercomputing Center, Edifici Omega 201, Jordi Girona 1 and 3, 08034 Barcelona, Spain
| | - Dario Garcia-Gasulla
- Barcelona Supercomputing Center, Edifici Omega 201, Jordi Girona 1 and 3, 08034 Barcelona, Spain
| | - Raquel Pérez-Arnal
- Barcelona Supercomputing Center, Edifici Omega 201, Jordi Girona 1 and 3, 08034 Barcelona, Spain
| | - Sergio Álvarez-Napagao
- Barcelona Supercomputing Center, Edifici Omega 201, Jordi Girona 1 and 3, 08034 Barcelona, Spain
| | - Enric Àlvarez
- Universitat Politècnica de Catalunya, Edifici Omega 205, Jordi Girona 29, 08034 Barcelona, Spain
| |
Collapse
|
29
|
Nobile MS, Coelho V, Pescini D, Damiani C. Accelerated global sensitivity analysis of genome-wide constraint-based metabolic models. BMC Bioinformatics 2021; 22:78. [PMID: 33902438 PMCID: PMC8074438 DOI: 10.1186/s12859-021-04002-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 02/07/2021] [Indexed: 01/20/2023] Open
Abstract
Background Genome-wide reconstructions of metabolism opened the way to thorough investigations of cell metabolism for health care and industrial purposes. However, the predictions offered by Flux Balance Analysis (FBA) can be strongly affected by the choice of flux boundaries, with particular regard to the flux of reactions that sink nutrients into the system. To mitigate possible errors introduced by a poor selection of such boundaries, a rational approach suggests to focus the modeling efforts on the pivotal ones. Methods In this work, we present a methodology for the automatic identification of the key fluxes in genome-wide constraint-based models, by means of variance-based sensitivity analysis. The goal is to identify the parameters for which a small perturbation entails a large variation of the model outcomes, also referred to as sensitive parameters. Due to the high number of FBA simulations that are necessary to assess sensitivity coefficients on genome-wide models, our method exploits a master-slave methodology that distributes the computation on massively multi-core architectures. We performed the following steps: (1) we determined the putative parameterizations of the genome-wide metabolic constraint-based model, using Saltelli’s method; (2) we applied FBA to each parameterized model, distributing the massive amount of calculations over multiple nodes by means of MPI; (3) we then recollected and exploited the results of all FBA runs to assess a global sensitivity analysis. Results We show a proof-of-concept of our approach on latest genome-wide reconstructions of human metabolism Recon2.2 and Recon3D. We report that most sensitive parameters are mainly associated with the intake of essential amino acids in Recon2.2, whereas in Recon 3D they are associated largely with phospholipids. We also illustrate that in most cases there is a significant contribution of higher order effects. Conclusion Our results indicate that interaction effects between different model parameters exist, which should be taken into account especially at the stage of calibration of genome-wide models, supporting the importance of a global strategy of sensitivity analysis. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04002-0.
Collapse
Affiliation(s)
- Marco S Nobile
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.,SYSBIO/ISBE.IT Centre for Systems Biology, Milan, Italy.,Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Vasco Coelho
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Dario Pescini
- Department of Statistics and Quantiative Methods, University of Milano-Bicocca, Milan, Italy.,SYSBIO/ISBE.IT Centre for Systems Biology, Milan, Italy
| | - Chiara Damiani
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy. .,SYSBIO/ISBE.IT Centre for Systems Biology, Milan, Italy.
| |
Collapse
|
30
|
Abstract
Due to the increasing availability of public bacterial genome data and cost efficiency of novel bacterial strain sequencing, phylogenetic analyses based on more than a single or few marker genes have become feasible. In this method protocol, we describe the complete bioinformatic workflow from raw genomic data to final phylogenetic analyses based on 107 conserved single copy genes. This approach can be used to perform phylogenetic reconstructions with high resolution on strain level or across taxa spanning different clades of the bacterial tree of life.
Collapse
|
31
|
Abstract
Multiple sequence alignment (MSA) is a central step in many bioinformatics and computational biology analyses. Although there exist many methods to perform MSA, most of them fail when dealing with large datasets due to their high computational cost. MSAProbs-MPI is a publicly available tool ( http://msaprobs.sourceforge.net ) that provides highly accurate results in relatively short runtime thanks to exploiting the hardware resources of multicore clusters. In this chapter, I explain the statistical and biological concepts employed in MSAProbs-MPI to complete the alignments, as well as the high-performance computing techniques used to accelerate it. Moreover, I provide some hints about the configuration parameters that should be used to guarantee high-performance executions.
Collapse
|
32
|
Yoon HJ, Klasky HB, Gounley JP, Alawad M, Gao S, Durbin EB, Wu XC, Stroup A, Doherty J, Coyle L, Penberthy L, Blair Christian J, Tourassi GD. Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports. J Biomed Inform 2020; 110:103564. [PMID: 32919043 DOI: 10.1016/j.jbi.2020.103564] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 07/14/2020] [Accepted: 09/06/2020] [Indexed: 12/24/2022]
Abstract
OBJECTIVE In machine learning, it is evident that the classification of the task performance increases if bootstrap aggregation (bagging) is applied. However, the bagging of deep neural networks takes tremendous amounts of computational resources and training time. The research question that we aimed to answer in this research is whether we could achieve higher task performance scores and accelerate the training by dividing a problem into sub-problems. MATERIALS AND METHODS The data used in this study consist of free text from electronic cancer pathology reports. We applied bagging and partitioned data training using Multi-Task Convolutional Neural Network (MT-CNN) and Multi-Task Hierarchical Convolutional Attention Network (MT-HCAN) classifiers. We split a big problem into 20 sub-problems, resampled the training cases 2,000 times, and trained the deep learning model for each bootstrap sample and each sub-problem-thus, generating up to 40,000 models. We performed the training of many models concurrently in a high-performance computing environment at Oak Ridge National Laboratory (ORNL). RESULTS We demonstrated that aggregation of the models improves task performance compared with the single-model approach, which is consistent with other research studies; and we demonstrated that the two proposed partitioned bagging methods achieved higher classification accuracy scores on four tasks. Notably, the improvements were significant for the extraction of cancer histology data, which had more than 500 class labels in the task; these results show that data partition may alleviate the complexity of the task. On the contrary, the methods did not achieve superior scores for the tasks of site and subsite classification. Intrinsically, since data partitioning was based on the primary cancer site, the accuracy depended on the determination of the partitions, which needs further investigation and improvement. CONCLUSION Results in this research demonstrate that 1. The data partitioning and bagging strategy achieved higher performance scores. 2. We achieved faster training leveraged by the high-performance Summit supercomputer at ORNL.
Collapse
Affiliation(s)
- Hong-Jun Yoon
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
| | - Hilda B Klasky
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
| | - John P Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
| | - Mohammed Alawad
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
| | - Shang Gao
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
| | - Eric B Durbin
- College of Medicine, University of Kentucky, Lexington, KY 40536, United States of America.
| | - Xiao-Cheng Wu
- Louisiana Tumor Registry, Louisiana State University Health Sciences Center, School of Public Health, New Orleans, LA 70112, United States of America.
| | - Antoinette Stroup
- New Jersey State Cancer Registry, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, 08901, United States of America.
| | - Jennifer Doherty
- Utah Cancer Registry, University of Utah School of Medicine, Salt Lake City, UT 84132, United States of America.
| | - Linda Coyle
- Information Management Services Inc., Calverton, MD 20705, United States of America.
| | - Lynne Penberthy
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD 20814, United States of America.
| | - J Blair Christian
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
| | - Georgia D Tourassi
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
| |
Collapse
|
33
|
Levrero-Florencio F, Margara F, Zacur E, Bueno-Orovio A, Wang Z, Santiago A, Aguado-Sierra J, Houzeaux G, Grau V, Kay D, Vázquez M, Ruiz-Baier R, Rodriguez B. Sensitivity analysis of a strongly-coupled human-based electromechanical cardiac model: Effect of mechanical parameters on physiologically relevant biomarkers. Comput Methods Appl Mech Eng 2020; 361:112762. [PMID: 32565583 PMCID: PMC7299076 DOI: 10.1016/j.cma.2019.112762] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The human heart beats as a result of multiscale nonlinear dynamics coupling subcellular to whole organ processes, achieving electrophysiologically-driven mechanical contraction. Computational cardiac modelling and simulation have achieved a great degree of maturity, both in terms of mathematical models of underlying biophysical processes and the development of simulation software. In this study, we present the detailed description of a human-based physiologically-based, and fully-coupled ventricular electromechanical modelling and simulation framework, and a sensitivity analysis focused on its mechanical properties. The biophysical detail of the model, from ionic to whole-organ, is crucial to enable future simulations of disease and drug action. Key novelties include the coupling of state-of-the-art human-based electrophysiology membrane kinetics, excitation-contraction and active contraction models, and the incorporation of a pre-stress model to allow for pre-stressing and pre-loading the ventricles in a dynamical regime. Through high performance computing simulations, we demonstrate that 50% to 200% - 1000% variations in key parameters result in changes in clinically-relevant mechanical biomarkers ranging from diseased to healthy values in clinical studies. Furthermore mechanical biomarkers are primarily affected by only one or two parameters. Specifically, ejection fraction is dominated by the scaling parameter of the active tension model and its scaling parameter in the normal direction ( k ort 2 ); the end systolic pressure is dominated by the pressure at which the ejection phase is triggered ( P ej ) and the compliance of the Windkessel fluid model ( C ); and the longitudinal fractional shortening is dominated by the fibre angle ( ϕ ) and k ort 2 . The wall thickening does not seem to be clearly dominated by any of the considered input parameters. In summary, this study presents in detail the description and implementation of a human-based coupled electromechanical modelling and simulation framework, and a high performance computing study on the sensitivity of mechanical biomarkers to key model parameters. The tools and knowledge generated enable future investigations into disease and drug action on human ventricles.
Collapse
Affiliation(s)
- F. Levrero-Florencio
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom
- Corresponding authors.
| | - F. Margara
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom
| | - E. Zacur
- Department of Engineering Science, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - A. Bueno-Orovio
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom
| | - Z.J. Wang
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom
| | - A. Santiago
- Barcelona Supercomputing Center – Centro Nacional de Supercomputación, Barcelona 08034, Spain
| | - J. Aguado-Sierra
- Barcelona Supercomputing Center – Centro Nacional de Supercomputación, Barcelona 08034, Spain
| | - G. Houzeaux
- Barcelona Supercomputing Center – Centro Nacional de Supercomputación, Barcelona 08034, Spain
| | - V. Grau
- Department of Engineering Science, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - D. Kay
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom
| | - M. Vázquez
- Barcelona Supercomputing Center – Centro Nacional de Supercomputación, Barcelona 08034, Spain
- ELEM Biotech, Spain
| | - R. Ruiz-Baier
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
- Universidad Adventista de Chile, Casilla 7-D, Chillan, Chile
| | - B. Rodriguez
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom
- Corresponding authors.
| |
Collapse
|
34
|
Kuzniar A, Maassen J, Verhoeven S, Santuari L, Shneider C, Kloosterman WP, de Ridder J. sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data. PeerJ 2020; 8:e8214. [PMID: 31934500 PMCID: PMC6951283 DOI: 10.7717/peerj.8214] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 11/14/2019] [Indexed: 12/19/2022] Open
Abstract
Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present sv-callers, a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.
Collapse
Affiliation(s)
| | | | | | - Luca Santuari
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Carl Shneider
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Wigard P Kloosterman
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| |
Collapse
|
35
|
Kabiri Chimeh M, Heywood P, Pennisi M, Pappalardo F, Richmond P. Parallelisation strategies for agent based simulation of immune systems. BMC Bioinformatics 2019; 20:579. [PMID: 31823716 DOI: 10.1186/s12859-019-3181-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 10/29/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years, the study of immune response behaviour using bottom up approach, Agent Based Modeling (ABM), has attracted considerable efforts. The ABM approach is a very common technique in the biological domain due to high demand for a large scale analysis tools for the collection and interpretation of information to solve biological problems. Simulating massive multi-agent systems (i.e. simulations containing a large number of agents/entities) requires major computational effort which is only achievable through the use of parallel computing approaches. RESULTS This paper explores different approaches to parallelising the key component of biological and immune system models within an ABM model: pairwise interactions. The focus of this paper is on the performance and algorithmic design choices of cell interactions in continuous and discrete space where agents/entities are competing to interact with one another within a parallel environment. CONCLUSIONS Our performance results demonstrate the applicability of these methods to a broader class of biological systems exhibiting typical cell to cell interactions. The advantage and disadvantage of each implementation is discussed showing each can be used as the basis for developing complete immune system models on parallel hardware.
Collapse
|
36
|
Chen S, He Z, Han X, He X, Li R, Zhu H, Zhao D, Dai C, Zhang Y, Lu Z, Chi X, Niu B. How Big Data and High-performance Computing Drive Brain Science. Genomics Proteomics Bioinformatics 2019; 17:381-392. [PMID: 31805369 PMCID: PMC6943776 DOI: 10.1016/j.gpb.2019.09.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 09/12/2019] [Accepted: 09/29/2019] [Indexed: 12/17/2022]
Abstract
Brain science accelerates the study of intelligence and behavior, contributes fundamental insights into human cognition, and offers prospective treatments for brain disease. Faced with the challenges posed by imaging technologies and deep learning computational models, big data and high-performance computing (HPC) play essential roles in studying brain function, brain diseases, and large-scale brain models or connectomes. We review the driving forces behind big data and HPC methods applied to brain science, including deep learning, powerful data analysis capabilities, and computational performance solutions, each of which can be used to improve diagnostic accuracy and research output. This work reinforces predictions that big data and HPC will continue to improve brain science by making ultrahigh-performance analysis possible, by improving data standardization and sharing, and by providing new neuromorphic insights.
Collapse
Affiliation(s)
- Shanyu Chen
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Zhipeng He
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Xinyin Han
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Xiaoyu He
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Ruilin Li
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Haidong Zhu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Dan Zhao
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Chuangchuang Dai
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Yu Zhang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
| | - Zhonghua Lu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
| | - Xuebin Chi
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China; Center of Scientific Computing Applications & Research, Chinese Academy of Sciences, Beijing 100190, China
| | - Beifang Niu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China; Guizhou University School of Medicine, Guiyang 550025, China.
| |
Collapse
|
37
|
Mabvakure BM, Rott R, Dobrowsky L, Van Heusden P, Morris L, Scheepers C, Moore PL. Advancing HIV Vaccine Research With Low-Cost High-Performance Computing Infrastructure: An Alternative Approach for Resource-Limited Settings. Bioinform Biol Insights 2019; 13:1177932219882347. [PMID: 35173421 PMCID: PMC8842485 DOI: 10.1177/1177932219882347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 09/21/2019] [Indexed: 11/17/2022] Open
Abstract
Next-generation sequencing (NGS) technologies have revolutionized biological research by generating genomic data that were once unaffordable by traditional first-generation sequencing technologies. These sequencing methodologies provide an opportunity for in-depth analyses of host and pathogen genomes as they are able to sequence millions of templates at a time. However, these large datasets can only be efficiently explored using bioinformatics analyses requiring huge data storage and computational resources adapted for high-performance processing. High-performance computing allows for efficient handling of large data and tasks that may require multi-threading and prolonged computational times, which is not feasible with ordinary computers. However, high-performance computing resources are costly and therefore not always readily available in low-income settings. We describe the establishment of an affordable high-performance computing bioinformatics cluster consisting of 3 nodes, constructed using ordinary desktop computers and open-source software including Linux Fedora, SLURM Workload Manager, and the Conda package manager. For the analysis of large antibody sequence datasets and for complex viral phylodynamic analyses, the cluster out-performed desktop computers. This has demonstrated that it is possible to construct high-performance computing capacity capable of analyzing large NGS data from relatively low-cost hardware and entirely free (open-source) software, even in resource-limited settings. Such a cluster design has broad utility beyond bioinformatics to other studies that require high-performance computing.
Collapse
Affiliation(s)
- Batsirai M Mabvakure
- Center for HIV and STIs, National Institute for Communicable Diseases, National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Division of Transfusion Medicine, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | | | - Peter Van Heusden
- South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa
| | - Lynn Morris
- Center for HIV and STIs, National Institute for Communicable Diseases, National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
| | - Cathrine Scheepers
- Center for HIV and STIs, National Institute for Communicable Diseases, National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Penny L Moore
- Center for HIV and STIs, National Institute for Communicable Diseases, National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
38
|
Abstract
BACKGROUND Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. RESULTS We present a one-stop computational pipeline, "Rapid Analysis of ChIP-Seq data" (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS . RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation. CONCLUSIONS The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.
Collapse
Affiliation(s)
- Alejandro Saettone
- Department of Chemistry and Biology, Ryerson University, 350 Victoria St, Toronto, M5B 2K3 Canada
| | - Marcelo Ponce
- SciNet High Performance Computing Consortium, University of Toronto, 661 University Ave, Toronto, M5G 1M1 Canada
| | - Syed Nabeel-Shah
- Department of Molecular Genetics, University of Toronto, 1 King’s College Cir, Toronto, M5S 1A8 Canada
| | - Jeffrey Fillingham
- Department of Chemistry and Biology, Ryerson University, 350 Victoria St, Toronto, M5B 2K3 Canada
| |
Collapse
|
39
|
Linderman MD, Chia D, Wallace F, Nothaft FA. DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark. BMC Bioinformatics 2019; 20:493. [PMID: 31604420 PMCID: PMC6787990 DOI: 10.1186/s12859-019-3108-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 09/20/2019] [Indexed: 11/16/2022] Open
Abstract
Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parameter space to obtain the best possible results. Results DECA is a horizontally scalable implementation of the XHMM algorithm using the ADAM framework and Apache Spark that incorporates novel algorithmic optimizations to eliminate unneeded computation. DECA parallelizes XHMM on both multi-core shared memory computers and large shared-nothing Spark clusters. We performed CNV discovery from the read-depth matrix in 2535 exomes in 9.3 min on a 16-core workstation (35.3× speedup vs. XHMM), 12.7 min using 10 executor cores on a Spark cluster (18.8× speedup vs. XHMM), and 9.8 min using 32 executor cores on Amazon AWS’ Elastic MapReduce. We performed CNV discovery from the original BAM files in 292 min using 640 executor cores on a Spark cluster. Conclusions We describe DECA’s performance, our algorithmic and implementation enhancements to XHMM to obtain that performance, and our lessons learned porting a complex genome analysis application to ADAM and Spark. ADAM and Apache Spark are a performant and productive platform for implementing large-scale genome analyses, but efficiently utilizing large clusters can require algorithmic optimizations and careful attention to Spark’s configuration parameters.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, 75 Shannon St, Middlebury, VT, 05753, USA.
| | - Davin Chia
- Department of Computer Science, Middlebury College, 75 Shannon St, Middlebury, VT, 05753, USA
| | - Forrest Wallace
- Department of Computer Science, Middlebury College, 75 Shannon St, Middlebury, VT, 05753, USA
| | - Frank A Nothaft
- AMPLab, University of California, Berkeley, Berkeley, CA, USA.,Databricks, Inc., San Francisco, CA, USA
| |
Collapse
|
40
|
Baele G, Ayres DL, Rambaut A, Suchard MA, Lemey P. High-Performance Computing in Bayesian Phylogenetics and Phylodynamics Using BEAGLE. Methods Mol Biol 2019; 1910:691-722. [PMID: 31278682 DOI: 10.1007/978-1-4939-9074-0_23] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
In this chapter, we focus on the computational challenges associated with statistical phylogenomics and how use of the broad-platform evolutionary analysis general likelihood evaluator (BEAGLE), a high-performance library for likelihood computation, can help to substantially reduce computation time in phylogenomic and phylodynamic analyses. We discuss computational improvements brought about by the BEAGLE library on a variety of state-of-the-art multicore hardware, and for a range of commonly used evolutionary models. For data sets of varying dimensions, we specifically focus on comparing performance in the Bayesian evolutionary analysis by sampling trees (BEAST) software between multicore central processing units (CPUs) and a wide range of graphics processing cards (GPUs). We put special emphasis on computational benchmarks from the field of phylodynamics, which combines the challenges of phylogenomics with those of modelling trait data associated with the observed sequence data. In conclusion, we show that for increasingly large molecular sequence data sets, GPUs can offer tremendous computational advancements through the use of the BEAGLE library, which is available for software packages for both Bayesian inference and maximum-likelihood frameworks.
Collapse
|
41
|
Rucci E, Garcia C, Botella G, De Giusti A, Naiouf M, Prieto-Matias M. SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences. BMC Syst Biol 2018; 12:96. [PMID: 30458766 PMCID: PMC6245597 DOI: 10.1186/s12918-018-0614-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background The Smith-Waterman (SW) algorithm is the best choice for searching similar regions between two DNA or protein sequences. However, it may become impracticable in some contexts due to its high computational demands. Consequently, the computer science community has focused on the use of modern parallel architectures such as Graphics Processing Units (GPUs), Xeon Phi accelerators and Field Programmable Gate Arrays (FGPAs) to speed up large-scale workloads. Results This paper presents and evaluates SWIFOLD: a Smith-Waterman parallel Implementation on FPGA with OpenCL for Long DNA sequences. First, we evaluate its performance and resource usage for different kernel configurations. Next, we carry out a performance comparison between our tool and other state-of-the-art implementations considering three different datasets. SWIFOLD offers the best average performance for small and medium test sets, achieving a performance that is independent of input size and sequence similarity. In addition, SWIFOLD provides competitive performance rates in comparison with GPU-based implementations on the latest GPU generation for the large dataset. Conclusions The results suggest that SWIFOLD can be a serious contender for accelerating the SW alignment of DNA sequences of unrestricted size in an affordable way reaching on average 125 GCUPS and almost a peak of 270 GCUPS.
Collapse
Affiliation(s)
- Enzo Rucci
- III-LIDI, CONICET, Facultad de Informática, Universidad Nacional de La Plata, La Plata (Buenos Aires), 1900, Argentina.
| | - Carlos Garcia
- Depto. Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Madrid, 28040, Spain
| | - Guillermo Botella
- Depto. Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Madrid, 28040, Spain
| | - Armando De Giusti
- III-LIDI, CONICET, Facultad de Informática, Universidad Nacional de La Plata, La Plata (Buenos Aires), 1900, Argentina
| | - Marcelo Naiouf
- III-LIDI, Facultad de Informática, Universidad Nacional de La Plata, La Plata (Buenos Aires), 1900, Argentina
| | - Manuel Prieto-Matias
- Depto. Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Madrid, 28040, Spain
| |
Collapse
|
42
|
Dong D, Xu Z, Zhong W, Peng S. Parallelization of Molecular Docking: A Review. Curr Top Med Chem 2018; 18:1015-1028. [PMID: 30129415 DOI: 10.2174/1568026618666180821145215] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 07/14/2018] [Accepted: 08/20/2018] [Indexed: 11/22/2022]
Abstract
Molecular docking, as one of the widely used virtual screening methods, aims to predict the binding-conformations of small molecule ligands to the appropriate target binding site. Because of the computational complexity and the arrival of the big data era, molecular docking requests High- Performance Computing (HPC) to improve its performance and accuracy. We discuss, in detail, the advances in accelerating molecular docking software in parallel, based on the different common HPC platforms, respectively. Not only the existing suitable programs have been optimized and ported to HPC platforms, but also many novel parallel algorithms have been designed and implemented. This review focuses on the techniques and methods adopted in parallelizing docking software. Where appropriate, we refer readers to exemplary case studies.
Collapse
Affiliation(s)
- Dong Dong
- School of Computer Science, National University of Defense Technology, Changsha, China
| | - Zhijian Xu
- CAS Key Laboratory of Receptor Research, Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Wu Zhong
- Beijing Institute of Pharmacology and Toxicology, Beijing, China
| | - Shaoliang Peng
- School of Computer Science, National University of Defense Technology, Changsha, China.,National Supercomputer Centre in Changsha & College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
43
|
Kim H, Brookes E, Cao W, Demeler B. Two-dimensional grid optimization for sedimentation velocity analysis in the analytical ultracentrifuge. Eur Biophys J 2018; 47:837-44. [PMID: 29777290 DOI: 10.1007/s00249-018-1309-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 04/17/2018] [Accepted: 05/02/2018] [Indexed: 10/16/2022]
Abstract
Sedimentation velocity experiments performed in the analytical ultracentrifuge are modeled using finite-element solutions of the Lamm equation. During modeling, three fundamental parameters are optimized: the sedimentation coefficients, the diffusion coefficients, and the partial concentrations of all solutes present in a mixture. A general modeling approach consists of fitting the partial concentrations of solutes defined in a two-dimensional grid of sedimentation and diffusion coefficient combinations that cover the range of possible solutes for a given mixture. An increasing number of grid points increase the resolution of the model produced by the subsequent analysis, with denser grids giving rise to a very large system of equations. Here, we evaluate the efficiency and resolution of several regular grids and show that traditionally defined grids tend to provide inadequate coverage in one region of the grid, while at the same time being computationally wasteful in other sections of the grid. We describe a rapid and systematic approach for generating efficient two-dimensional analysis grids that balance optimal information content and model resolution for a given signal-to-noise ratio with improved calculation efficiency. These findings are general and apply to one- and two-dimensional grids, although they no longer represent regular grids. We provide a recipe for an improved grid-point spacing in both directions which eliminates unnecessary points, while at the same time providing a more uniform resolution that can be scaled based on the stochastic noise in the experimental data.
Collapse
|
44
|
Guo P, Zhu B, Niu H, Wang Z, Liang Y, Chen Y, Zhang L, Ni H, Guo Y, Hay EHA, Gao X, Gao H, Wu X, Xu L, Li J. Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis. BMC Bioinformatics 2018; 19:3. [PMID: 29298666 PMCID: PMC5751823 DOI: 10.1186/s12859-017-2003-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 12/18/2017] [Indexed: 11/20/2022] Open
Abstract
Background Running multiple-chain Markov Chain Monte Carlo (MCMC) provides an efficient parallel computing method for complex Bayesian models, although the efficiency of the approach critically depends on the length of the non-parallelizable burn-in period, for which all simulated data are discarded. In practice, this burn-in period is set arbitrarily and often leads to the performance of far more iterations than required. In addition, the accuracy of genomic predictions does not improve after the MCMC reaches equilibrium. Results Automatic tuning of the burn-in length for running multiple-chain MCMC was proposed in the context of genomic predictions using BayesA and BayesCπ models. The performance of parallel computing versus sequential computing and tunable burn-in MCMC versus fixed burn-in MCMC was assessed using simulation data sets as well by applying these methods to genomic predictions of a Chinese Simmental beef cattle population. The results showed that tunable burn-in parallel MCMC had greater speedups than fixed burn-in parallel MCMC, and both had greater speedups relative to sequential (single-chain) MCMC. Nevertheless, genomic estimated breeding values (GEBVs) and genomic prediction accuracies were highly comparable between the various computing approaches. When applied to the genomic predictions of four quantitative traits in a Chinese Simmental population of 1217 beef cattle genotyped by an Illumina Bovine 770 K SNP BeadChip, tunable burn-in multiple-chain BayesCπ (TBM-BayesCπ) outperformed tunable burn-in multiple-chain BayesCπ (TBM-BayesA) and Genomic Best Linear Unbiased Prediction (GBLUP) in terms of the prediction accuracy, although the differences were not necessarily caused by computational factors and could have been intrinsic to the statistical models per se. Conclusions Automatically tunable burn-in multiple-chain MCMC provides an accurate and cost-effective tool for high-performance computing of Bayesian genomic prediction models, and this algorithm is generally applicable to high-performance computing of any complex Bayesian statistical model. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-2003-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Peng Guo
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China.,College of Computer and Information Engineering, Tianjin Agricultural University, Tianjin, China
| | - Bo Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China
| | - Hong Niu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China
| | - Zezhao Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China
| | - Yonghu Liang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China
| | - Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China
| | - Hemin Ni
- Animal Science and Technology College, Beijing University of Agriculture, Beijing, China
| | - Yong Guo
- Animal Science and Technology College, Beijing University of Agriculture, Beijing, China
| | - El Hamidi A Hay
- Livestock and Range Research Laboratory, ARS, USDA, Miles City, MT, USA
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China
| | - Xiaolin Wu
- Biostatistics and Bioinformatics, GeneSeek (A Neogen company), Lincoln, NE, 68504, USA.,Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China.
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road 2#, Haidian District, Beijing, 100193, China.
| |
Collapse
|
45
|
Nikitina N, Ivashko E, Tchernykh A. Congestion game scheduling for virtual drug screening optimization. J Comput Aided Mol Des 2017; 32:363-374. [PMID: 29264790 DOI: 10.1007/s10822-017-0093-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 12/15/2017] [Indexed: 12/19/2022]
Abstract
In virtual drug screening, the chemical diversity of hits is an important factor, along with their predicted activity. Moreover, interim results are of interest for directing the further research, and their diversity is also desirable. In this paper, we consider a problem of obtaining a diverse set of virtual screening hits in a short time. To this end, we propose a mathematical model of task scheduling for virtual drug screening in high-performance computational systems as a congestion game between computational nodes to find the equilibrium solutions for best balancing the number of interim hits with their chemical diversity. The model considers the heterogeneous environment with workload uncertainty, processing time uncertainty, and limited knowledge about the input dataset structure. We perform computational experiments and evaluate the performance of the developed approach considering organic molecules database GDB-9. The used set of molecules is rich enough to demonstrate the feasibility and practicability of proposed solutions. We compare the algorithm with two known heuristics used in practice and observe that game-based scheduling outperforms them by the hit discovery rate and chemical diversity at earlier steps. Based on these results, we use a social utility metric for assessing the efficiency of our equilibrium solutions and show that they reach greatest values.
Collapse
Affiliation(s)
- Natalia Nikitina
- Institute of Applied Mathematical Research, Karelian Research Center, Russian Academy of Sciences, 11 Pushkinskaya str., 185910, Petrozavodsk, Russia.
| | - Evgeny Ivashko
- Institute of Applied Mathematical Research, Karelian Research Center, Russian Academy of Sciences, 11 Pushkinskaya str., 185910, Petrozavodsk, Russia
| | - Andrei Tchernykh
- Computer Science Department, CICESE Research Center, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Código Postal 22860, Apdo. Postal 360, Ensenada, Baja California, Mexico
| |
Collapse
|
46
|
Avramidis E, Akman OE. Optimisation of an exemplar oculomotor model using multi-objective genetic algorithms executed on a GPU-CPU combination. BMC Syst Biol 2017; 11:40. [PMID: 28340582 PMCID: PMC5364688 DOI: 10.1186/s12918-017-0416-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 03/02/2017] [Indexed: 11/10/2022]
Abstract
BACKGROUND Parameter optimisation is a critical step in the construction of computational biology models. In eye movement research, computational models are increasingly important to understanding the mechanistic basis of normal and abnormal behaviour. In this study, we considered an existing neurobiological model of fast eye movements (saccades), capable of generating realistic simulations of: (i) normal horizontal saccades; and (ii) infantile nystagmus - pathological ocular oscillations that can be subdivided into different waveform classes. By developing appropriate fitness functions, we optimised the model to existing experimental saccade and nystagmus data, using a well-established multi-objective genetic algorithm. This algorithm required the model to be numerically integrated for very large numbers of parameter combinations. To address this computational bottleneck, we implemented a master-slave parallelisation, in which the model integrations were distributed across the compute units of a GPU, under the control of a CPU. RESULTS While previous nystagmus fitting has been based on reproducing qualitative waveform characteristics, our optimisation protocol enabled us to perform the first direct fits of a model to experimental recordings. The fits to normal eye movements showed that although saccades of different amplitudes can be accurately simulated by individual parameter sets, a single set capable of fitting all amplitudes simultaneously cannot be determined. The fits to nystagmus oscillations systematically identified the parameter regimes in which the model can reproduce a number of canonical nystagmus waveforms to a high accuracy, whilst also identifying some waveforms that the model cannot simulate. Using a GPU to perform the model integrations yielded a speedup of around 20 compared to a high-end CPU. CONCLUSIONS The results of both optimisation problems enabled us to quantify the predictive capacity of the model, suggesting specific modifications that could expand its repertoire of simulated behaviours. In addition, the optimal parameter distributions we obtained were consistent with previous computational studies that had proposed the saccadic braking signal to be the origin of the instability preceding the development of infantile nystagmus oscillations. Finally, the master-slave parallelisation method we developed to accelerate the optimisation process can be readily adapted to fit other highly parametrised computational biology models to experimental data.
Collapse
Affiliation(s)
- Eleftherios Avramidis
- Centre for Systems, Dynamics and Control, College of Engineering, Mathematics and Physical Sciences, University of Exeter, North Park Road, Exeter, EX4 4QF, UK.,Department of Electronic Engineering, National University of Ireland, Maynooth, Ireland
| | - Ozgur E Akman
- Centre for Systems, Dynamics and Control, College of Engineering, Mathematics and Physical Sciences, University of Exeter, North Park Road, Exeter, EX4 4QF, UK.
| |
Collapse
|
47
|
Bracciali A, Aldinucci M, Patterson M, Marschall T, Pisanti N, Merelli I, Torquati M. PWHATSHAP: efficient haplotyping for future generation sequencing. BMC Bioinformatics 2016; 17:342. [PMID: 28185544 PMCID: PMC5046197 DOI: 10.1186/s12859-016-1170-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Background Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WhatsHap is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology’s current trends that are producing longer fragments. Results Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered pWhatsHap, a parallel, high-performance version of WhatsHap. pWhatsHap is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WhatsHap, pWhatsHap exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WhatsHap, which increases with coverage. Conclusions Due to its structure and management of the large datasets, the parallelisation of WhatsHap posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, pWhatsHap, is a freely available toolkit that improves the efficiency of the analysis of genomics information.
Collapse
Affiliation(s)
- Andrea Bracciali
- Computer Science and Mathematics, School of Natural Sciences, Stirling University, Stirling, FK9 4LA, UK.
| | - Marco Aldinucci
- Department of Computer Science, University of Torino, Torino, Italy
| | - Murray Patterson
- Laboratoire de Biométrie et Biologie Evolutive, University Claude Bernard, Lyon, France
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarland, Germany.,Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Nadia Pisanti
- Department of Computer Science, University of Pisa, Pisa, Italy.,Erable Team, INRIA, Grenoble, France
| | - Ivan Merelli
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| | | |
Collapse
|
48
|
Spjuth O, Bongcam-Rudloff E, Dahlberg J, Dahlö M, Kallio A, Pireddu L, Vezzi F, Korpelainen E. Recommendations on e-infrastructures for next-generation sequencing. Gigascience 2016; 5:26. [PMID: 27267963 PMCID: PMC4897895 DOI: 10.1186/s13742-016-0132-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 05/23/2016] [Indexed: 11/21/2022] Open
Abstract
With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.
Collapse
Affiliation(s)
- Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, P.O. Box 591, SE-75124, Sweden.
| | - Erik Bongcam-Rudloff
- SLU-Global Bioinformatics Centre, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Johan Dahlberg
- National Genomics Infrastructure, Science for Life Laboratory, Uppsala University, Stockholm, P.O. Box 1031, SE-17121, Sweden
| | - Martin Dahlö
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, P.O. Box 591, SE-75124, Sweden.,Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, SE-75123, Sweden
| | - Aleksi Kallio
- CSC - IT Center for Science Ltd., Espoo, P.O. Box 405, FI-02101, Finland
| | - Luca Pireddu
- CRS4, Polaris, Loc. Piscina Manna Ed. 1, Pula, 09010, Italy.,University of Cagliari, Cagliari, 09124, Italy
| | - Francesco Vezzi
- Science for Life Laboratory, Stockholm University, Stockholm, SE-17121, Sweden
| | - Eija Korpelainen
- CSC - IT Center for Science Ltd., Espoo, P.O. Box 405, FI-02101, Finland
| |
Collapse
|
49
|
Stone JE, Hallock MJ, Phillips JC, Peterson JR, Luthey-Schulten Z, Schulten K. Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads. IEEE Int Symp Parallel Distrib Process Workshops Phd Forum 2016; 2016:89-100. [PMID: 27516922 DOI: 10.1109/ipdpsw.2016.130] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.
Collapse
Affiliation(s)
- John E Stone
- Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Michael J Hallock
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - James C Phillips
- Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Joseph R Peterson
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Zaida Luthey-Schulten
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Klaus Schulten
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
50
|
Abstract
Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
Collapse
Affiliation(s)
- Mohammad Mahdi Jaghoori
- />Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | - Boris Bleijlevens
- />Department of Medical Biochemistry, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | - Silvia D. Olabarriaga
- />Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|