1
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/23/2021] [Indexed: 11/20/2022] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 11/08/2023] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Al-Khawaga S, Mohammed I, Saraswathi S, Haris B, Hasnah R, Saeed A, Almabrazi H, Syed N, Jithesh P, El Awwa A, Khalifa A, AlKhalaf F, Petrovski G, Abdelalim EM, Hussain K. The clinical and genetic characteristics of permanent neonatal diabetes (PNDM) in the state of Qatar. Mol Genet Genomic Med 2019; 7:e00753. [PMID: 31441606 PMCID: PMC6785445 DOI: 10.1002/mgg3.753] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 04/04/2019] [Accepted: 04/27/2019] [Indexed: 02/06/2023] Open
Abstract
Background Neonatal diabetes mellitus (NDM) is a rare condition that occurs within the first six months of life. Permanent NDM (PNDM) is caused by mutations in specific genes that are known for their expression at early and/or late stages of pancreatic beta‐ cell development, and are either involved in beta‐cell survival, insulin processing, regulation, and release. The native population in Qatar continues to practice consanguineous marriages that lead to a high level of homozygosity. To our knowledge, there is no previous report on the genomics of NDM among the Qatari population. The aims of the current study are to identify patients with NDM diagnosed between 2001 and 2016, and examine their clinical and genetic characteristics. Methods To calculate the incidence of PNDM, all patients with PNDM diagnosed between 2001 and 2016 were compared to the total number of live births over the 16‐year‐period. Whole Genome Sequencing (WGS) was used to investigate the genetic etiology in the PNDM cohort. Results PNDM was diagnosed in nine (n = 9) patients with an estimated incidence rate of 1:22,938 live births among the indigenous Qatari. Seven different mutations in six genes (PTF1A, GCK, SLC2A2, EIF2AK3, INS, and HNF1B) were identified. In the majority of cases, the genetic etiology was part of a previously identified autosomal recessive disorder. Two novel de novo mutations were identified in INS and HNF1B. Conclusion Qatar has the second highest reported incidence of PNDM worldwide. A majority of PNDM cases present as rare familial autosomal recessive disorders. Pancreas associated transcription factor 1a (PTF1A) enhancer deletions are the most common cause of PNDM in Qatar, with only a few previous cases reported in the literature.
Collapse
Affiliation(s)
- Sara Al-Khawaga
- College of Health & Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar.,Diabetes Research Center, Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Idris Mohammed
- College of Health & Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Saras Saraswathi
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Basma Haris
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Reem Hasnah
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Amira Saeed
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | | | - Najeeb Syed
- Biomedical Informatics Division, Sidra Medicine, Doha, Qatar
| | - Puthen Jithesh
- Biomedical Informatics Division, Sidra Medicine, Doha, Qatar
| | - Ahmed El Awwa
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar.,Faculty of medicine, Alexandria University, Alexandria, Egypt
| | - Amal Khalifa
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Fawziya AlKhalaf
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Goran Petrovski
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| | - Essam M Abdelalim
- College of Health & Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,Diabetes Research Center, Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Khalid Hussain
- Division of Endocrinology, Department of Pediatric Medicine, Sidra Medicine, Doha, Qatar
| |
Collapse
|
4
|
Kathiresan N, Temanni R, Almabrazi H, Syed N, Jithesh PV, Al-Ali R. Accelerating next generation sequencing data analysis with system level optimizations. Sci Rep 2017; 7:9058. [PMID: 28831090 PMCID: PMC5567265 DOI: 10.1038/s41598-017-09089-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 07/20/2017] [Indexed: 11/09/2022] Open
Abstract
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
Collapse
Affiliation(s)
- Nagarajan Kathiresan
- Biomedical Informatics, Research Branch, Sidra Medical and Research Center, Post Box No. 26999, Doha, Qatar.
| | - Ramzi Temanni
- Biomedical Informatics, Research Branch, Sidra Medical and Research Center, Post Box No. 26999, Doha, Qatar
| | - Hakeem Almabrazi
- Biomedical Informatics, Research Branch, Sidra Medical and Research Center, Post Box No. 26999, Doha, Qatar
| | - Najeeb Syed
- Biomedical Informatics, Research Branch, Sidra Medical and Research Center, Post Box No. 26999, Doha, Qatar
| | - Puthen V Jithesh
- Biomedical Informatics, Research Branch, Sidra Medical and Research Center, Post Box No. 26999, Doha, Qatar
| | - Rashid Al-Ali
- Biomedical Informatics, Research Branch, Sidra Medical and Research Center, Post Box No. 26999, Doha, Qatar
| |
Collapse
|
5
|
O'Leary BM, Davis SG, Smith MF, Brown B, Kemp MB, Almabrazi H, Grundstad JA, Burns T, Leontiev V, Andorf J, Clark AF, Sheffield VC, Casavant TL, Scheetz TE, Stone EM, Braun TA. Transcript annotation prioritization and screening system (TrAPSS) for mutation screening. J Bioinform Comput Biol 2008; 5:1155-72. [PMID: 18172923 DOI: 10.1142/s0219720007003132] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Revised: 07/31/2007] [Accepted: 08/13/2007] [Indexed: 11/18/2022]
Abstract
When searching for disease-causing mutations with polymerase chain reaction (PCR)-based methods, candidate genes are usually screened in their entirety, exon by exon. Genomic resources (i.e. www.ncbi.nih.gov, www.ensembl.org, and genome.ucsc.edu) largely support this paradigm for mutation screening by making it easy to view and access sequence data associated with genes in their genomic context. However, the administrative burden of conducting mutation screening in potentially hundreds of genes and thousands of exons in thousands of patients is significant, even with the use of public genome resources. For example, the manual design of oligonucleotide primers for all exons of the 10 Leber's congenital amaurosis (LCA) genes (149 exons) represents a significant information management challenge. The Transcript Annotation Prioritization and Screening System (TrAPSS) is designed to accelerate mutation screening by (1) providing a gene-based local cache of candidate disease genes in a genomic context, (2) automating tasks associated with optimizing candidate disease gene screening and information management, and (3) providing the implementation of an algorithmic technique to utilize large amounts of heterogeneous genome annotation (e.g. conserved protein functional domains) so as to prioritize candidate genes.
Collapse
Affiliation(s)
- Brian M O'Leary
- Coordinated Laboratory for Computational Genomics, University of Iowa, Iowa City, IA 52242, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|