1
|
Malamon JS, Farrell JJ, Xia LC, Dombroski BA, Das RG, Way J, Kuzma AB, Valladares O, Leung YY, Scanlon AJ, Lopez IAB, Brehony J, Worley KC, Zhang NR, Wang LS, Farrer LA, Schellenberg GD, Lee WP, Vardarajan BN. A comparative study of structural variant calling in WGS from Alzheimer's disease families. Life Sci Alliance 2024; 7:e202302181. [PMID: 38418088 PMCID: PMC10902710 DOI: 10.26508/lsa.202302181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 03/01/2024] Open
Abstract
Detecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins. We developed a novel metric, D-score, to evaluate caller specificity for deletions. The accuracy of deletions was evaluated by Sanger sequencing. We generated a high-quality call set of 152,301 deletions of diverse sizes. Sanger sequencing validated 114 of 146 detected deletions (78.1%). Scalpel excelled in accuracy for deletions ≤100 bp, whereas Parliament was optimal for deletions >900 bp. Overall, 83.0% and 72.5% of calls by Scalpel and Parliament were validated, respectively, including all 11 deletions called by both Parliament and Scalpel between 101 and 900 bp. Our flexible protocol successfully generated a high-quality deletion call set and a truth set of Sanger sequencing-validated deletions with precise breakpoints spanning 1-17,000 bp.
Collapse
Affiliation(s)
- John S Malamon
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - John J Farrell
- Biomedical Genetics Section, Department of Medicine, Boston University School of Medicine, Boston University, Boston, MA, USA
| | - Li Charlie Xia
- https://ror.org/03mtd9a03 Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Beth A Dombroski
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Rueben G Das
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jessica Way
- Broad Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Amanda B Kuzma
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Otto Valladares
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Allison J Scanlon
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Irving Antonio Barrera Lopez
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jack Brehony
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Kim C Worley
- https://ror.org/02pttbw34 Human Genome Sequencing Center, and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Lindsay A Farrer
- Biomedical Genetics Section, Department of Medicine, Boston University School of Medicine, Boston University, Boston, MA, USA
- Departments of Neurology and Ophthalmology, Boston University School of Medicine, Boston University, Boston, MA, USA
- Departments of Epidemiology and Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Badri N Vardarajan
- https://ror.org/01esghr10 Gertrude H. Sergievsky Center and Taub Institute of Aging Brain, Department of Neurology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
2
|
Li D, Farrell JJ, Mez J, Martin ER, Bush WS, Ruiz A, Boada M, de Rojas I, Mayeux R, Haines JL, Vance MAP, Wang LS, Schellenberg GD, Lunetta KL, Farrer LA. Novel loci for Alzheimer's disease identified by a genome-wide association study in Ashkenazi Jews. Alzheimers Dement 2023; 19:5550-5562. [PMID: 37260021 PMCID: PMC10689571 DOI: 10.1002/alz.13117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/29/2023] [Accepted: 04/11/2023] [Indexed: 06/02/2023]
Abstract
INTRODUCTION Most Alzheimer's disease (AD) loci have been discovered in individuals with European ancestry (EA). METHODS We applied principal component analysis using Gaussian mixture models and an Ashkenazi Jewish (AJ) reference genome-wide association study (GWAS) data set to identify Ashkenazi Jews ascertained in GWAS (n = 42,682), whole genome sequencing (WGS, n = 16,815), and whole exome sequencing (WES, n = 20,504) data sets. The association of AD was tested genome wide (GW) in the GWAS and WGS data sets and exome wide (EW) in all three data sets (EW). Gene-based analyses were performed using aggregated rare variants. RESULTS In addition to apolipoprotein E (APOE), GW analyses (1355 cases and 1661 controls) revealed associations with TREM2 R47H (p = 9.66 × 10-9 ), rs541586606 near RAB3B (p = 5.01 × 10-8 ), and rs760573036 between SPOCK3 and ANXA10 (p = 6.32 × 10-8 ). In EW analyses (1504 cases and 2047 controls), study-wide significant association was observed with rs1003710 near SMAP2 (p = 1.91 × 10-7 ). A significant gene-based association was identified with GIPR (p = 7.34 × 10-7 ). DISCUSSION Our results highlight the efficacy of founder populations for AD genetic studies.
Collapse
Affiliation(s)
- Donghe Li
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - John J Farrell
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Jesse Mez
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Eden R. Martin
- Dr. John T. Macdonald Foundation, University of Miami, Miami, FL 33136, USA
- Department of Human Genetics, University of Miami, Miami, FL 33136, USA
| | - William S. Bush
- Department of Population & Quantitative Health Science and Cleveland Institute for Computational Biology, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, USA
| | - Agustin Ruiz
- Research Center and Memory Clinic, ACE Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Barcelona, Spain
- CIBERNED, Network Center for Biomedical Research in Neurodegenerative Diseases, National Institute of Health Carlos III, Madrid, Spain
| | - Mercè Boada
- Research Center and Memory Clinic, ACE Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Barcelona, Spain
- CIBERNED, Network Center for Biomedical Research in Neurodegenerative Diseases, National Institute of Health Carlos III, Madrid, Spain
| | - Itziar de Rojas
- Research Center and Memory Clinic, ACE Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Barcelona, Spain
- CIBERNED, Network Center for Biomedical Research in Neurodegenerative Diseases, National Institute of Health Carlos III, Madrid, Spain
| | - Richard Mayeux
- Taub Institute on Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center Department of Neurology, Columbia University, 710 West 168th Street, New York, NY 10032, USA
| | - Jonathan L. Haines
- Department of Population & Quantitative Health Science and Cleveland Institute for Computational Biology, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, USA
| | - Margaret A. Pericak Vance
- Dr. John T. Macdonald Foundation, University of Miami, Miami, FL 33136, USA
- Department of Human Genetics, University of Miami, Miami, FL 33136, USA
- Department of Neurology, University of Miami, Miami, FL 33136, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA 19104, USA
| | - Gerard D. Schellenberg
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA 19104, USA
| | - Kathryn L. Lunetta
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - Lindsay A. Farrer
- Department of Medicine (Biomedical Genetics), Boston University Chobanian & Avedisian School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Ophthalmology, Boston University Chobanian & Avedisian School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA 02118, USA
| |
Collapse
|
3
|
Gorla A, Jew B, Zhang L, Sul JH. xGAP: A python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery. Bioinformatics 2021; 37:9-16. [PMID: 33416856 PMCID: PMC8034531 DOI: 10.1093/bioinformatics/btaa1097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/22/2020] [Accepted: 01/04/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Since the first human genome was sequenced in 2001, there has been a rapid growth in the number of bioinformatic methods to process and analyze next generation sequencing (NGS) data for research and clinical studies that aim to identify genetic variants influencing diseases and traits. To achieve this goal, one first needs to call genetic variants from NGS data which requires multiple computationally intensive analysis steps. Unfortunately, there is a lack of an open source pipeline that can perform all these steps on NGS data in a manner which is fully automated, efficient, rapid, scalable, modular, user-friendly and fault tolerant. To address this, we introduce xGAP, an extensible Genome Analysis Pipeline, which implements modified GATK best practice to analyze DNA-seq data with aforementioned functionalities. RESULTS xGAP implements massive parallelization of the modified GATK best practice pipeline by splitting a genome into many smaller regions with efficient load-balancing to achieve high scalability. It can process 30x coverage whole-genome sequencing (WGS) data in approximately 90 minutes. In terms of accuracy of discovered variants, xGAP achieves average F1 scores of 99.37% for SNVs and 99.20% for Indels across seven benchmark WGS datasets. We achieve highly consistent results across multiple on-premises (SGE & SLURM) high performance clusters. Compared to the Churchill pipeline, with similar parallelization, xGAP is 20% faster when analyzing 50X coverage WGS in AWS. Finally, xGAP is user-friendly and fault tolerant where it can automatically re-initiate failed processes to minimize required user intervention. AVAILABILITY xGAP is available at https://github.com/Adigorla/xgap. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aditya Gorla
- Department of Bioengineering, University of California, Los, Los, U.S.A Angeles, Angeles, CA 90095
| | - Brandon Jew
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA 90095, Los, U.S.A. Angeles
| | - Luke Zhang
- Undergraduate Neuroscience Interdepartmental Program, University of California, Los Angeles, CA 90095, Los, U.S.A. Angeles
| | - Jae Hoon Sul
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, CA 90095, Los, U.S.A Angeles
| |
Collapse
|
4
|
Avram S, Mernea M, Limban C, Borcan F, Chifiriuc C. Potential Therapeutic Approaches to Alzheimer's Disease By Bioinformatics, Cheminformatics And Predicted Adme-Tox Tools. Curr Neuropharmacol 2020; 18:696-719. [PMID: 31885353 PMCID: PMC7536829 DOI: 10.2174/1570159x18666191230120053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 12/24/2019] [Accepted: 12/28/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is considered a severe, irreversible and progressive neurodegenerative disorder. Currently, the pharmacological management of AD is based on a few clinically approved acethylcholinesterase (AChE) and N-methyl-D-aspartate (NMDA) receptor ligands, with unclear molecular mechanisms and severe side effects. METHODS Here, we reviewed the most recent bioinformatics, cheminformatics (SAR, drug design, molecular docking, friendly databases, ADME-Tox) and experimental data on relevant structurebiological activity relationships and molecular mechanisms of some natural and synthetic compounds with possible anti-AD effects (inhibitors of AChE, NMDA receptors, beta-secretase, amyloid beta (Aβ), redox metals) or acting on multiple AD targets at once. We considered: (i) in silico supported by experimental studies regarding the pharmacological potential of natural compounds as resveratrol, natural alkaloids, flavonoids isolated from various plants and donepezil, galantamine, rivastagmine and memantine derivatives, (ii) the most important pharmacokinetic descriptors of natural compounds in comparison with donepezil, memantine and galantamine. RESULTS In silico and experimental methods applied to synthetic compounds led to the identification of new AChE inhibitors, NMDA antagonists, multipotent hybrids targeting different AD processes and metal-organic compounds acting as Aβ inhibitors. Natural compounds appear as multipotent agents, acting on several AD pathways: cholinesterases, NMDA receptors, secretases or Aβ, but their efficiency in vivo and their correct dosage should be determined. CONCLUSION Bioinformatics, cheminformatics and ADME-Tox methods can be very helpful in the quest for an effective anti-AD treatment, allowing the identification of novel drugs, enhancing the druggability of molecular targets and providing a deeper understanding of AD pathological mechanisms.
Collapse
Affiliation(s)
| | - Maria Mernea
- Address correspondence to this author at the Department of Anatomy, Animal Physiology and Biophysics, Faculty of Biology, University of Bucharest, 91-95th Spl. Independentei, Bucharest, Romania; Tel/Fax: ++4-021-318-1573; E-mail:
| | | | | | | |
Collapse
|