1
|
Dhindsa RS, Weido BA, Dhindsa JS, Shetty AJ, Sands CF, Petrovski S, Vitsios D, Zoghbi AW. Genome-wide prediction of dominant and recessive neurodevelopmental disorder-associated genes. Am J Hum Genet 2025; 112:693-708. [PMID: 40015282 PMCID: PMC11947176 DOI: 10.1016/j.ajhg.2025.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 01/31/2025] [Accepted: 02/03/2025] [Indexed: 03/01/2025] Open
Abstract
Despite great progress, thousands of neurodevelopmental disorder (NDD) risk genes remain to be discovered. We present a computational approach that accelerates NDD risk gene identification using machine learning. First, we demonstrate that models trained solely on single-cell RNA sequencing data can robustly predict genes implicated in autism spectrum disorder (ASD), developmental and epileptic encephalopathy (DEE), and developmental delay (DD). Notably, we find differences in gene expression patterns of genes with monoallelic and bi-allelic inheritance patterns in the developing human cortex. We then integrate expression data with 300 orthogonal features, including intolerance metrics, protein-protein interaction data, and others, in a semi-supervised machine learning framework (mantis-ml) to train inheritance-specific models for these disorders. The models have high predictive power (area under the receiver operator curves [AUCs]: 0.84-0.95), and the top-ranked genes were up to 2-fold (monoallelic models) and 6-fold (bi-allelic models) more enriched for high-confidence NDD risk genes compared to genic intolerance metrics alone. Additionally, genes ranking in the top decile were 45 to 180 times more likely to have literature support than those in the bottom decile. Collectively, this work provides robust NDD risk gene predictions that can complement large-scale gene discovery efforts and underscores the importance of considering inheritance in gene risk prediction.
Collapse
Affiliation(s)
- Ryan S Dhindsa
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, USA.
| | - Blake A Weido
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Justin S Dhindsa
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, USA
| | - Arya J Shetty
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Chloe F Sands
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK; Department of Medicine, University of Melbourne, Austin Health, Melbourne, VIC, Australia
| | - Dimitrios Vitsios
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Anthony W Zoghbi
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA; Menninger Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
2
|
Petrazzini BO, Balick DJ, Forrest IS, Cho J, Rocheleau G, Jordan DM, Do R. Ensemble and consensus approaches to prediction of recessive inheritance for missense variants in human disease. CELL REPORTS METHODS 2024; 4:100914. [PMID: 39657681 PMCID: PMC11704621 DOI: 10.1016/j.crmeth.2024.100914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/19/2024] [Accepted: 11/13/2024] [Indexed: 12/12/2024]
Abstract
Mode of inheritance (MOI) is necessary for clinical interpretation of pathogenic variants; however, the majority of variants lack this information. Furthermore, variant effect predictors are fundamentally insensitive to recessive-acting diseases. Here, we present MOI-Pred, a variant pathogenicity prediction tool that accounts for MOI, and ConMOI, a consensus method that integrates variant MOI predictions from three independent tools. MOI-Pred integrates evolutionary and functional annotations to produce variant-level predictions that are sensitive to both dominant-acting and recessive-acting pathogenic variants. Both MOI-Pred and ConMOI show state-of-the-art performance on standard benchmarks. Importantly, dominant and recessive predictions from both tools are enriched in individuals with pathogenic variants for dominant- and recessive-acting diseases, respectively, in a real-world electronic health record (EHR)-based validation approach of 29,981 individuals. ConMOI outperforms its component methods in benchmarking and validation, demonstrating the value of consensus among multiple prediction methods. Predictions for all possible missense variants are provided in the "Data and code availability" section.
Collapse
Affiliation(s)
- Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniel J Balick
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Biomedical Informatics, Harvard, Medical School, Boston, MA, USA
| | - Iain S Forrest
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Judy Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniel M Jordan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
3
|
Versbraegen N, Gravel B, Nachtegael C, Renaux A, Verkinderen E, Nowé A, Lenaerts T, Papadimitriou S. Faster and more accurate pathogenic combination predictions with VarCoPP2.0. BMC Bioinformatics 2023; 24:179. [PMID: 37127601 PMCID: PMC10152795 DOI: 10.1186/s12859-023-05291-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/14/2023] [Indexed: 05/03/2023] Open
Abstract
BACKGROUND The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.
Collapse
Affiliation(s)
- Nassim Versbraegen
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium.
| | - Barbara Gravel
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Charlotte Nachtegael
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Alexandre Renaux
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Emma Verkinderen
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Tom Lenaerts
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| | - Sofia Papadimitriou
- Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050, Brussels, Belgium
| |
Collapse
|
4
|
Salnikova LE, Kolobkov DS, Sviridova DA, Abilev SK. An overview of germline variations in genes of primary immunodeficiences through integrative analysis of ClinVar, HGMD ® and dbSNP databases. Hum Genet 2021; 140:1379-1393. [PMID: 34272616 DOI: 10.1007/s00439-021-02316-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 07/10/2021] [Indexed: 12/20/2022]
Abstract
Primary immunodeficiencies (PID) are a diverse group of genetic disorders caused by inadequate development and function of immune system. Identifying genetic etiology is important for genetic counselling and treatment decisions. Clinical relevance of genetic variants is a complex problem depending on gene-specific and variant specific genotype-phenotype interactions. To address this challenge, we aimed to characterize the pathogenic landscape of PID genes by combining the analysis of germline variations reported in ClinVar and HGMD® and identification of damaging variations available in dbSNP. We generated a joint ClinVar/HGMD database, which included 111,940 variants, among them 32,452 were classified as pathogenic/likely pathogenic. From a total of 5,415,794 bi- or multiallelic variants in PID genes recorded in dbSNP, we retrieved 38,291 high impact (HI) biallelic variants with presumably disruptive impact in the protein, of them 25,500 variants were not present in ClinVar/HGMD. Using a functional prediction algorithm, we additionally identified 28,507 deleterious and 56,016 neutral missense variants among dbSNP variants and created a collection of damaging and neutral variations in PID genes, not currently present in ClinVar/HGMD, with their allele frequencies and mappings to protein domains. The distribution of pathogenic variants from ClinVar/HGMD, HI variants and deleterious missense variants from dbSNP was analyzed in the context of hereditary pattern and gene specific metrics, such as pLI and haploinsufficiency. Our report summarized data on complex gene-specific variability in PID genes and might be useful for the identification of the most promising variants and gene regions for further study.
Collapse
Affiliation(s)
- Lyubov E Salnikova
- The Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, 3 Gubkin Street, Moscow, 117971, Russia. .,The Laboratory of Molecular Immunology, Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia. .,The Laboratory of Clinical Pathophysiology of Critical Conditions, Federal Research and Clinical Center of Intensive Care Medicine and Rehabilitology, Moscow, Russia.
| | - Dmitry S Kolobkov
- The Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, 3 Gubkin Street, Moscow, 117971, Russia.,Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Darya A Sviridova
- The Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, 3 Gubkin Street, Moscow, 117971, Russia
| | - Serikbai K Abilev
- The Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, 3 Gubkin Street, Moscow, 117971, Russia
| |
Collapse
|
5
|
Alyousfi D, Baralle D, Collins A. Essentiality-specific pathogenicity prioritization gene score to improve filtering of disease sequence data. Brief Bioinform 2020; 22:1782-1789. [PMID: 32186701 DOI: 10.1093/bib/bbaa029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2019] [Revised: 02/17/2020] [Accepted: 02/18/2020] [Indexed: 11/12/2022] Open
Abstract
The causal genetic variants underlying more than 50% of single gene (monogenic) disorders are yet to be discovered. Many patients with conditions likely to have a monogenic basis do not receive a confirmed molecular diagnosis which has potential impacts on clinical management. We have developed a gene-specific score, essentiality-specific pathogenicity prioritization (ESPP), to guide the recognition of genes likely to underlie monogenic disease variation to assist in filtering of genome sequence data. When a patient genome is sequenced, there are frequently several plausibly pathogenic variants identified in different genes. Recognition of the single gene most likely to include pathogenic variation can guide the identification of a causal variant. The ESPP score integrates gene-level scores which are broadly related to gene essentiality. Previous work towards the recognition of monogenic disease genes proposed a model with increasing gene essentiality from 'non-essential' to 'essential' genes (for which pathogenic variation may be incompatible with survival) with genes liable to contain disease variation positioned between these two extremes. We demonstrate that the ESPP score is useful for recognizing genes with high potential for pathogenic disease-related variation. Genes classed as essential have particularly high scores, as do genes recently recognized as strong candidates for developmental disorders. Through the integration of individual gene-specific scores, which have different properties and assumptions, we demonstrate the utility of an essentiality-based gene score to improve sequence genome filtering.
Collapse
|
6
|
Hsu JS, Zhang R, Yeung F, Tang CSM, Wong JKL, So MT, Xia H, Sham P, Tam PK, Li M, Wong KKY, Garcia-Barcelo MM. Cancer gene mutations in congenital pulmonary airway malformation patients. ERJ Open Res 2019; 5:00196-2018. [PMID: 30740464 PMCID: PMC6360213 DOI: 10.1183/23120541.00196-2018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 11/29/2018] [Indexed: 12/24/2022] Open
Abstract
Background Newborns affected with congenital pulmonary airway malformations (CPAMs) may present with severe respiratory distress or remain asymptomatic. While surgical resection is the definitive treatment for symptomatic CPAMs, prophylactic elective surgery may be recommended for asymptomatic CPAMs owing to the risk of tumour development. However, the implementation of prophylactic surgery is quite controversial on the grounds that more evidence linking CPAMs and cancer is needed. The large gap in knowledge of CPAM pathogenesis results in uncertainties and controversies in disease management. As developmental genes control postnatal cell growth and contribute to cancer development, we hypothesised that CPAMs may be underlain by germline mutations in genes governing airways development. Methods Sequencing of the exome of 19 patients and their unaffected parents. Results A more than expected number of mutations in cancer genes (false discovery rate q-value <5.01×10−5) was observed. The co-occurrence, in the same patient, of damaging variants in genes encoding interacting proteins is intriguing, the most striking being thyroglobulin (TG) and its receptor, megalin (LRP2). Both genes are highly relevant in lung development and cancer. Conclusions The overall excess of mutations in cancer genes may account for the reported association of CPAMs with carcinomas and provide some evidence to argue for prophylactic surgery by some surgeons. Congenital pulmonary airway malformation (CPAM) patients have more than expected numbers of damaging variants in genes involved in lung carcinoma; this may provide evidence for clinicians choosing to adopt prophylactic excision in CPAMhttp://ow.ly/h1AE30n4DIe
Collapse
Affiliation(s)
- Jacob Shujui Hsu
- Dept of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.,Centre for Genomics Science, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Ruizhong Zhang
- Dept of Pediatric Surgery, Guangzhou Women and Children's Medical Center, Guangzhou, China
| | - Fanny Yeung
- Dept of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Clara S M Tang
- Dept of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - John K L Wong
- Dept of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Man-Ting So
- Dept of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Huimin Xia
- Dept of Pediatric Surgery, Guangzhou Women and Children's Medical Center, Guangzhou, China
| | - Pak Sham
- Dept of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.,Centre for Genomics Science, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Paul K Tam
- Dept of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Miaoxin Li
- Dept of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.,Centre for Genomics Science, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Kenneth K Y Wong
- Dept of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | | |
Collapse
|
7
|
Hsu JSJ, So M, Tang CSM, Karim A, Porsch RM, Wong C, Yu M, Yeung F, Xia H, Zhang R, Cherny SS, Chung PHY, Wong KKY, Sham PC, Ngo ND, Li M, Tam PKH, Lui VCH, Garcia-Barcelo MM. De novo mutations in Caudal Type Homeo Box transcription Factor 2 (CDX2) in patients with persistent cloaca. Hum Mol Genet 2019; 27:351-358. [PMID: 29177441 DOI: 10.1093/hmg/ddx406] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 10/27/2017] [Indexed: 12/24/2022] Open
Abstract
The cloaca is an embryonic cavity that is divided into the urogenital sinus and rectum upon differentiation of the cloacal epithelium triggered by tissue-specific transcription factors including CDX2. Defective differentiation leads to persistent cloaca in humans (PC), a phenotype recapitulated in Cdx2 mutant mice. PC is linked to hypo/hyper-vitaminosis A. Although no gene has ever been identified, there is a strong evidence for a genetic contribution to PC. We applied whole-exome sequencing and copy-number-variants analyses to 21 PC patients and their unaffected parents. The damaging p.Cys132* and p.Arg237His de novo CDX2 variants were identified in two patients. These variants altered the expression of CYP26A1, a direct CDX2 target encoding the major retinoic acid (RA)-degrading enzyme. Other RA genes, including the RA-receptor alpha, were also mutated. Genes governing the development of cloaca-derived structures were recurrently mutated and over-represented in the basement-membrane components set (q-value < 1.65 × 10-6). Joint analysis of the patients' profile highlighted the extracellular matrix-receptor interaction pathway (MsigDBID: M7098, FDR: q-value < 7.16 × 10-9). This is the first evidence that PC is genetic, with genes involved in the RA metabolism at the lead. Given the CDX2 de novo variants and the role of RA, our observations could potentiate preventive measures. For the first time, a gene recapitulating PC in mouse models is found mutated in humans.
Collapse
Affiliation(s)
- Jacob S J Hsu
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Manting So
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Clara S M Tang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Anwarul Karim
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Robert M Porsch
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Carol Wong
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Michelle Yu
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Fanny Yeung
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Huimin Xia
- Department of Pediatric Surgery, Guangzhou Women and Children's Medical Center, Guangzhou, Guandong, China
| | - Ruizhong Zhang
- Department of Pediatric Surgery, Guangzhou Women and Children's Medical Center, Guangzhou, Guandong, China
| | - Stacey S Cherny
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Patrick H Y Chung
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Kenneth K Y Wong
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Pak C Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.,Centre for Genomic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Ngoc Diem Ngo
- Department of Human Genetics, National Hospital of Pediatrics, Hà N?i, Vietnam
| | - Miaoxin Li
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Paul K H Tam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Vincent C H Lui
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | | |
Collapse
|
8
|
Li M, Li J, Li MJ, Pan Z, Hsu JS, Liu DJ, Zhan X, Wang J, Song Y, Sham PC. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework. Nucleic Acids Res 2017; 45:e75. [PMID: 28115622 PMCID: PMC5435951 DOI: 10.1093/nar/gkx019] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/06/2017] [Indexed: 12/21/2022] Open
Abstract
Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.
Collapse
Affiliation(s)
- Miaoxin Li
- Department of Medical Genetics, Center for Genome Research, Center for Precision Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China.,The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong.,Department of Psychiatry, the University of Hong Kong, Pokfulam, Hong Kong.,State Key Laboratory for Cognitive and Brain Sciences, the University of Hong Kong, Pokfulam, Hong Kong
| | - Jiang Li
- School of Biomedical Sciences, the University of Hong Kong, Pokfulam, Hong Kong
| | - Mulin Jun Li
- The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong
| | - Zhicheng Pan
- Department of Psychiatry, the University of Hong Kong, Pokfulam, Hong Kong
| | - Jacob Shujui Hsu
- Department of Psychiatry, the University of Hong Kong, Pokfulam, Hong Kong
| | - Dajiang J Liu
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA 17033, USA.,Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA 17033, USA
| | - Xiaowei Zhan
- Quantitative Biomedical Research Center, Department of Clinical Science, Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Junwen Wang
- The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong.,Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ 85259, USA.,Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA
| | - Youqiang Song
- The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong.,School of Biomedical Sciences, the University of Hong Kong, Pokfulam, Hong Kong
| | - Pak Chung Sham
- The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong.,Department of Psychiatry, the University of Hong Kong, Pokfulam, Hong Kong.,State Key Laboratory for Cognitive and Brain Sciences, the University of Hong Kong, Pokfulam, Hong Kong
| |
Collapse
|
9
|
Pang SYY, Hsu JS, Teo KC, Li Y, Kung MHW, Cheah KSE, Chan D, Cheung KMC, Li M, Sham PC, Ho SL. Burden of rare variants in ALS genes influences survival in familial and sporadic ALS. Neurobiol Aging 2017; 58:238.e9-238.e15. [PMID: 28709720 DOI: 10.1016/j.neurobiolaging.2017.06.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Revised: 05/29/2017] [Accepted: 06/11/2017] [Indexed: 01/25/2023]
Abstract
Genetic variants are implicated in the development of amyotrophic lateral sclerosis (ALS), but it is unclear whether the burden of rare variants in ALS genes has an effect on survival. We performed whole genome sequencing on 8 familial ALS (FALS) patients with superoxide dismutase 1 (SOD1) mutation and whole exome sequencing on 46 sporadic ALS (SALS) patients living in Hong Kong and found that 67% had at least 1 rare variant in the exons of 40 ALS genes; 22% had 2 or more. Patients with 2 or more rare variants had lower probability of survival than patients with 0 or 1 variant (p = 0.001). After adjusting for other factors, each additional rare variant increased the risk of respiratory failure or death by 60% (p = 0.0098). The presence of the rare variant was associated with the risk of ALS (Odds ratio 1.91, 95% confidence interval 1.03-3.61, p = 0.03), and ALS patients had higher rare variant burden than controls (MB, p = 0.004). Our findings support an oligogenic basis with the burden of rare variants affecting the development and survival of ALS.
Collapse
Affiliation(s)
- Shirley Yin-Yu Pang
- Division of Neurology, Department of Medicine, University of Hong Kong, Hong Kong, P.R. China
| | - Jacob Shujui Hsu
- Department of Psychiatry, University of Hong Kong, Hong Kong, P.R. China; Centre for Genomic Sciences, Li Ka Shing Faculty of Medicine, University of Hong Kong, Hong Kong, P.R. China
| | - Kay-Cheong Teo
- Division of Neurology, Department of Medicine, University of Hong Kong, Hong Kong, P.R. China
| | - Yan Li
- Department of Psychiatry, University of Hong Kong, Hong Kong, P.R. China; Centre for Genomic Sciences, Li Ka Shing Faculty of Medicine, University of Hong Kong, Hong Kong, P.R. China
| | - Michelle H W Kung
- Division of Neurology, Department of Medicine, University of Hong Kong, Hong Kong, P.R. China
| | - Kathryn S E Cheah
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, University of Hong Kong, Hong Kong, P.R. China
| | - Danny Chan
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, University of Hong Kong, Hong Kong, P.R. China
| | - Kenneth M C Cheung
- Department of Orthopaedics & Traumatology, University of Hong Kong, Hong Kong, P.R. China
| | - Miaoxin Li
- Department of Psychiatry, University of Hong Kong, Hong Kong, P.R. China; Centre for Genomic Sciences, Li Ka Shing Faculty of Medicine, University of Hong Kong, Hong Kong, P.R. China; Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P.R. China; Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, P.R. China.
| | - Pak-Chung Sham
- Department of Psychiatry, University of Hong Kong, Hong Kong, P.R. China; Centre for Genomic Sciences, Li Ka Shing Faculty of Medicine, University of Hong Kong, Hong Kong, P.R. China.
| | - Shu-Leong Ho
- Division of Neurology, Department of Medicine, University of Hong Kong, Hong Kong, P.R. China.
| |
Collapse
|
10
|
Li MJ, Li M, Liu Z, Yan B, Pan Z, Huang D, Liang Q, Ying D, Xu F, Yao H, Wang P, Kocher JPA, Xia Z, Sham PC, Liu JS, Wang J. cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes. Genome Biol 2017; 18:52. [PMID: 28302177 PMCID: PMC5356314 DOI: 10.1186/s13059-017-1177-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Accepted: 02/21/2017] [Indexed: 02/06/2023] Open
Abstract
It remains challenging to predict regulatory variants in particular tissues or cell types due to highly context-specific gene regulation. By connecting large-scale epigenomic profiles to expression quantitative trait loci (eQTLs) in a wide range of human tissues/cell types, we identify critical chromatin features that predict variant regulatory potential. We present cepip, a joint likelihood framework, for estimating a variant’s regulatory probability in a context-dependent manner. Our method exhibits significant GWAS signal enrichment and is superior to existing cell type-specific methods. Furthermore, using phenotypically relevant epigenomes to weight the GWAS single-nucleotide polymorphisms, we improve the statistical power of the gene-based association test.
Collapse
Affiliation(s)
- Mulin Jun Li
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China. .,Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China. .,Department of Statistics, Harvard University, Cambridge, Boston, MA, 02138-2901, USA.
| | - Miaoxin Li
- Department of Medical Genetics, Center for Genome Research, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.,Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China.,Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China.,Centre for Reproduction, Development and Growth, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Zipeng Liu
- Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China.,Department of Anaesthesiology, The University of Hong Kong, Hong Kong SAR, China
| | - Bin Yan
- Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China.,School of Biomedical Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Zhicheng Pan
- Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China.,Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, CA, 90095, USA
| | - Dandan Huang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Qian Liang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Dingge Ying
- Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Feng Xu
- Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China.,School of Biomedical Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Hongcheng Yao
- Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China.,School of Biomedical Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Panwen Wang
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ, 85259, USA
| | - Jean-Pierre A Kocher
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ, 85259, USA
| | - Zhengyuan Xia
- Department of Anaesthesiology, The University of Hong Kong, Hong Kong SAR, China
| | - Pak Chung Sham
- Centre for Genomic Sciences, The University of Hong Kong, Hong Kong SAR, China.,Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, Boston, MA, 02138-2901, USA.
| | - Junwen Wang
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ, 85259, USA. .,Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, 85259, USA.
| |
Collapse
|