1
|
Shadrina M, Kalay Ö, Demirkaya-Budak S, LeDuc CA, Chung WK, Turgut D, Budak G, Arslan E, Semenyuk V, Davis-Dusenbery B, Seidman CE, Yost HJ, Jain A, Gelb BD. Efficient identification of de novo mutations in family trios: a consensus-based informatic approach. Life Sci Alliance 2025; 8:e202403039. [PMID: 40155050 PMCID: PMC11953573 DOI: 10.26508/lsa.202403039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 03/19/2025] [Accepted: 03/20/2025] [Indexed: 04/01/2025] Open
Abstract
Accurate identification of de novo variants (DNVs) remains challenging despite advances in sequencing technologies, often requiring ad hoc filters and manual inspection. Here, we explored a purely informatic, consensus-based approach for identifying DNVs in proband-parent trios using short-read genome sequencing data. We evaluated variant calls generated by three sequence analysis pipelines-GATK HaplotypeCaller, DeepTrio, and Velsera GRAF-and examined the assumption that a requirement of consensus can serve as an effective filter for high-quality DNVs. Comparison with a highly accurate DNV set, validated previously by manual inspection and Sanger sequencing, demonstrated that consensus filtering, followed by a force-calling procedure, effectively removed false-positive calls, achieving 98.0-99.4% precision. At the same time, sensitivity of the workflow based on the previously established DNVs reached 99.4%. Validation in the HG002-3-4 Genome-in-a-Bottle trio confirmed its robustness, with precision reaching 99.2% and sensitivity up to 96.6%. We believe that this consensus approach can be widely implemented as an automated bioinformatics workflow suitable for large-scale analyses without the need for manual intervention, especially when very high precision is valued over sensitivity.
Collapse
Affiliation(s)
- Mariya Shadrina
- Mindich Child Health and Development Institute and the Department of Genetics and Genomic Sciences, Icahn School of Medicine, New York, NY, USA
| | | | | | - Charles A LeDuc
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Wendy K Chung
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | | | | | | | | | | | - Christine E Seidman
- Division of Cardiovascular Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - H Joseph Yost
- Molecular Medicine Program, University of Utah, Salt Lake City, UT, USA
| | | | - Bruce D Gelb
- Mindich Child Health and Development Institute and the Department of Genetics and Genomic Sciences, Icahn School of Medicine, New York, NY, USA
- Department of Pediatrics, Icahn School of Medicine, New York, NY, USA
| |
Collapse
|
2
|
Watanabe Y, Nishioka M, Morikawa R, Takano-Isozaki S, Igeta H, Mori K, Kato T, Someya T. Rare nonsynonymous germline and mosaic de novo variants in Japanese patients with schizophrenia. Psychiatry Clin Neurosci 2025; 79:37-44. [PMID: 39439118 DOI: 10.1111/pcn.13758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 09/23/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024]
Abstract
AIM Whole-exome sequencing (WES) studies have revealed that germline de novo variants (gDNVs) contribute to the genetic etiology of schizophrenia. However, the contribution of mosaic DNVs (mDNVs) to the risk of schizophrenia remains to be elucidated. In the present study, we systematically investigated the gDNVs and mDMVs that contribute to the genetic etiology of schizophrenia in a Japanese population. METHODS We performed deep WES (depth: 460×) of 73 affected offspring and WES (depth: 116×) of 134 parents from 67 families with schizophrenia. Prioritized rare nonsynonymous gDNV and mDNV candidates were validated using Sanger sequencing and ultra-deep targeted amplicon sequencing (depth: 71,375×), respectively. Subsequently, we performed a Gene Ontology analysis of the gDNVs and mDNVs to obtain biological insights. Lastly, we selected DNVs in known risk genes for psychiatric and neurodevelopmental disorders. RESULTS We identified 62 gDNVs and 98 mDNVs. The Gene Ontology analysis of mDNVs implicated actin filament and actin cytoskeleton as candidate biological pathways. There were eight DNVs in known risk genes: splice region gDNVs in AKAP11 and CUL1; a frameshift gDNV in SHANK1; a missense gDNV in SRCAP; missense mDNVs in CTNNB1, GRIN2A, and TSC2; and a nonsense mDNV in ZFHX4. CONCLUSION Our results suggest the potential contributions of rare nonsynonymous gDNVs and mDNVs to the genetic etiology of schizophrenia. This is the first report of the mDNVs in schizophrenia trios, demonstrating their potential relevance to schizophrenia pathology.
Collapse
Affiliation(s)
- Yuichiro Watanabe
- Department of Psychiatry, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
- Department of Psychiatry, Uonuma Kikan Hospital, Niigata, Japan
| | - Masaki Nishioka
- Department of Psychiatry and Behavioral Science, Juntendo University Graduate School of Medicine, Tokyo, Japan
| | - Ryo Morikawa
- Department of Psychiatry, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Satoko Takano-Isozaki
- Department of Psychiatry, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Hirofumi Igeta
- Department of Psychiatry, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Kanako Mori
- Department of Psychiatry and Behavioral Science, Juntendo University Graduate School of Medicine, Tokyo, Japan
| | - Tadafumi Kato
- Department of Psychiatry and Behavioral Science, Juntendo University Graduate School of Medicine, Tokyo, Japan
| | - Toshiyuki Someya
- Department of Psychiatry, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| |
Collapse
|
3
|
Li C, Meng X. Effective analysis of job satisfaction among medical staff in Chinese public hospitals: a random forest model. Front Public Health 2024; 12:1357709. [PMID: 38699429 PMCID: PMC11063264 DOI: 10.3389/fpubh.2024.1357709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/05/2024] [Indexed: 05/05/2024] Open
Abstract
Objective This study explored the factors and influence degree of job satisfaction among medical staff in Chinese public hospitals by constructing the optimal discriminant model. Methods The participant sample is based on the service volume of 12,405 officially appointed medical staff from different departments of 16 public hospitals for three consecutive years from 2017 to 2019. All medical staff (doctors, nurses, administrative personnel) invited to participate in the survey for the current year will no longer repeat their participation. The importance of all associated factors and the optimal evaluation model has been calculated. Results The overall job satisfaction of medical staff is 25.62%. The most important factors affecting medical staff satisfaction are: Value staff opinions (Q10), Get recognition for your work (Q11), Democracy (Q9), and Performance Evaluation Satisfaction (Q5). The random forest model is the best evaluation model for medical staff satisfaction, and its prediction accuracy is higher than other similar models. Conclusion The improvement of medical staff job satisfaction is significantly related to the improvement of democracy, recognition of work, and increased employee performance. It has shown that improving these five key variables can maximize the job satisfaction and motivation of medical staff. The random forest model can maximize the accuracy and effectiveness of similar research.
Collapse
Affiliation(s)
| | - Xuehui Meng
- Department of Health Service Management, Humanities and Management School, Zhejiang Chinese Medical University, Hangzhou, China
| |
Collapse
|
4
|
Shadrina M, Kalay Ö, Demirkaya-Budak S, LeDuc CA, Chung WK, Turgut D, Budak G, Arslan E, Semenyuk V, Davis-Dusenbery B, Seidman CE, Yost HJ, Jain A, Gelb BD. Automated Identification of Germline de novo Mutations in Family Trios: A Consensus-Based Informatic Approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.08.584100. [PMID: 38559260 PMCID: PMC10979888 DOI: 10.1101/2024.03.08.584100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Accurate identification of germline de novo variants (DNVs) remains a challenging problem despite rapid advances in sequencing technologies as well as methods for the analysis of the data they generate, with putative solutions often involving ad hoc filters and visual inspection of identified variants. Here, we present a purely informatic method for the identification of DNVs by analyzing short-read genome sequencing data from proband-parent trios. Our method evaluates variant calls generated by three genome sequence analysis pipelines utilizing different algorithms-GATK HaplotypeCaller, DeepTrio and Velsera GRAF-exploring the assumption that a requirement of consensus can serve as an effective filter for high-quality DNVs. We assessed the efficacy of our method by testing DNVs identified using a previously established, highly accurate classification procedure that partially relied on manual inspection and used Sanger sequencing to validate a DNV subset comprising less confident calls. The results show that our method is highly precise and that applying a force-calling procedure to putative variants further removes false-positive calls, increasing precision of the workflow to 99.6%. Our method also identified novel DNVs, 87% of which were validated, indicating it offers a higher recall rate without compromising accuracy. We have implemented this method as an automated bioinformatics workflow suitable for large-scale analyses without need for manual intervention.
Collapse
Affiliation(s)
- Mariya Shadrina
- Mindich Child Health and Development Institute and the Department of Genetics and Genomic Sciences, Icahn School of Medicine, New York, NY, USA
| | - Özem Kalay
- Velsera Inc, 529 Main St, Suite 6610, Charlestown, MA, USA
| | | | - Charles A. LeDuc
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Wendy K. Chung
- Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Deniz Turgut
- Velsera Inc, 529 Main St, Suite 6610, Charlestown, MA, USA
| | - Gungor Budak
- Velsera Inc, 529 Main St, Suite 6610, Charlestown, MA, USA
| | - Elif Arslan
- Velsera Inc, 529 Main St, Suite 6610, Charlestown, MA, USA
| | | | | | - Christine E. Seidman
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - H. Joseph Yost
- Molecular Medicine Program, University of Utah, Salt Lake City, UT, USA
| | - Amit Jain
- Velsera Inc, 529 Main St, Suite 6610, Charlestown, MA, USA
| | - Bruce D. Gelb
- Mindich Child Health and Development Institute and the Department of Genetics and Genomic Sciences, Icahn School of Medicine, New York, NY, USA
- Department of Pediatrics, Icahn School of Medicine, New York, NY, USA
| |
Collapse
|
5
|
Burda K, Konczal M. Validation of machine learning approach for direct mutation rate estimation. Mol Ecol Resour 2023; 23:1757-1771. [PMID: 37486035 DOI: 10.1111/1755-0998.13841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 06/16/2023] [Accepted: 07/05/2023] [Indexed: 07/25/2023]
Abstract
Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent-offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified three mutations, but overall required more hands-on curation and had higher rates of false positives and false negatives. Both methods concordantly showed no difference in mutation rates between families. Estimated here the guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates; however, previous research has also found low estimated rates in other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine learning approaches.
Collapse
Affiliation(s)
- Katarzyna Burda
- Evolutionary Biology Group, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
| | - Mateusz Konczal
- Evolutionary Biology Group, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
| |
Collapse
|
6
|
Nishioka M, Takayama J, Sakai N, Kazuno AA, Ishiwata M, Ueda J, Hayama T, Fujii K, Someya T, Kuriyama S, Tamiya G, Takata A, Kato T. Deep exome sequencing identifies enrichment of deleterious mosaic variants in neurodevelopmental disorder genes and mitochondrial tRNA regions in bipolar disorder. Mol Psychiatry 2023; 28:4294-4306. [PMID: 37248276 PMCID: PMC10827672 DOI: 10.1038/s41380-023-02096-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/31/2023]
Abstract
Bipolar disorder (BD) is a global medical issue, afflicting around 1% of the population with manic and depressive episodes. Despite various genetic studies, the genetic architecture and pathogenesis of BD have not been fully resolved. Besides germline variants, postzygotic mosaic variants are proposed as new candidate mechanisms contributing to BD. Here, we performed extensive deep exome sequencing (DES, ~300×) and validation experiments to investigate the roles of mosaic variants in BD with 235 BD cases (194 probands of trios and 41 single cases) and 39 controls. We found an enrichment of developmental disorder (DD) genes in the genes hit by deleterious mosaic variants in BD (P = 0.000552), including a ClinVar-registered pathogenic variant in ARID2. An enrichment of deleterious mosaic variants was also observed for autism spectrum disorder (ASD) genes (P = 0.000428). The proteins coded by the DD/ASD genes with non-synonymous mosaic variants in BD form more protein-protein interaction than expected, suggesting molecular mechanisms shared with DD/ASD but restricted to a subset of cells in BD. We also found significant enrichment of mitochondrial heteroplasmic variants, another class of mosaic variants, in mitochondrial tRNA genes in BD (P = 0.0102). Among them, recurrent m.3243 A > G variants known as causal for mitochondrial diseases were found in two unrelated BD probands with allele fractions of 5-12%, lower than in mitochondrial diseases. Despite the limitation of using peripheral tissues, our DES investigation supports the possible contribution of deleterious mosaic variants in the nuclear genome responsible for severer phenotypes, such as DD/ASD, to the risk of BD and further demonstrates that the same paradigm can be applied to the mitochondrial genome. These results, as well as the enrichment of heteroplasmic mitochondrial tRNA variants in BD, add a new piece to the understanding of the genetic architecture of BD and provide general insights into the pathological roles of mosaic variants in human diseases.
Collapse
Affiliation(s)
- Masaki Nishioka
- Department of Psychiatry and Behavioral Science, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan.
- Department of Molecular Pathology of Mood Disorders, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan.
- Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
| | - Jun Takayama
- Department of AI and Innovative Medicine, Tohoku University School of Medicine, 2-1 Seiryo-machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-Ku, Sendai, Miyagi, 980-8573, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Naomi Sakai
- Department of Psychiatry and Behavioral Science, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan
- Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - An-A Kazuno
- Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Mizuho Ishiwata
- Department of Psychiatry and Behavioral Science, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan
- Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Junko Ueda
- Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Takashi Hayama
- Yokohama Mental Clinic Totsuka, 494-8 Kamikurata-cho, Totsuka-ku, Yokohama, 244-0816, Japan
| | - Kumiko Fujii
- Department of Psychiatry, Shiga University of Medical Science, Seta Tsukinowa-Cho, Otsu, Shiga, 520-2192, Japan
| | - Toshiyuki Someya
- Department of Psychiatry, Niigata University Graduate School of Medical and Dental Sciences, 757 Asahimachidori-ichibancho, Chuo-ku, Niigata, 951-8510, Japan
| | - Shinichi Kuriyama
- Department of Preventive Medicine and Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-Ku, Sendai, Miyagi, 980-8573, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, 2-1 Seiryo-machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
| | - Gen Tamiya
- Department of AI and Innovative Medicine, Tohoku University School of Medicine, 2-1 Seiryo-machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
- Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-Ku, Sendai, Miyagi, 980-8573, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Atsushi Takata
- Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
- Research Institute for Diseases of Old Age, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan.
| | - Tadafumi Kato
- Department of Psychiatry and Behavioral Science, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan.
- Department of Molecular Pathology of Mood Disorders, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan.
- Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
| |
Collapse
|
7
|
Lian Q, Chen Y, Chang F, Fu Y, Qi J. inGAP-family: Accurate Detection of Meiotic Recombination Loci and Causal Mutations by Filtering Out Artificial Variants due to Genome Complexities. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:524-535. [PMID: 33711466 PMCID: PMC9801030 DOI: 10.1016/j.gpb.2019.11.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/04/2019] [Accepted: 11/08/2019] [Indexed: 01/26/2023]
Abstract
Accurately identifying DNA polymorphisms can bridge the gap between phenotypes and genotypes and is essential for molecular marker assisted genetic studies. Genome complexities, including large-scale structural variations, bring great challenges to bioinformatic analysis for obtaining high-confidence genomic variants, as sequence differences between non-allelic loci of two or more genomes can be misinterpreted as polymorphisms. It is important to correctly filter out artificial variants to avoid false genotyping or estimation of allele frequencies. Here, we present an efficient and effective framework, inGAP-family, to discover, filter, and visualize DNA polymorphisms and structural variants (SVs) from alignment of short reads. Applying this method to polymorphism detection on real datasets shows that elimination of artificial variants greatly facilitates the precise identification of meiotic recombination points as well as causal mutations in mutant genomes or quantitative trait loci. In addition, inGAP-family provides a user-friendly graphical interface for detecting polymorphisms and SVs, further evaluating predicted variants and identifying mutations related to genotypes. It is accessible at https://sourceforge.net/projects/ingap-family/.
Collapse
|
8
|
Wang Q, Duan M, Fan Y, Liu S, Ren Y, Huang L, Zhou F. Transforming OMIC features for classification using Siamese convolutional networks. J Bioinform Comput Biol 2022; 20:2250013. [DOI: 10.1142/s0219720022500135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
9
|
Penkl A, Reunert J, Debus OM, Homann A, Och U, Rust S, Marquardt T. A mutation in the neonatal isoform of SCN2A causes neonatal-onset epilepsy. Am J Med Genet A 2021; 188:941-947. [PMID: 34874093 DOI: 10.1002/ajmg.a.62581] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 10/11/2021] [Accepted: 10/22/2021] [Indexed: 11/08/2022]
Abstract
SCN2A (sodium channel 2A) encodes the Nav1.2 channel protein in excitatory neurons in the brain. Nav1.2 is a critical voltage-gated sodium channel of the central nervous system. Mutations in SCN2A are responsible for a broad phenotypic spectrum ranging from autism and developmental delay to severe encephalopathy with neonatal or early infantile onset. SCN2A can be spliced into two different isoforms, a neonatal (6N) and an adult (6A) form. After birth, there is an equal or higher amount of the 6N isoform, protecting the brain from the increased neuronal excitability of the infantile brain. During postnatal development, 6N is gradually replaced by 6A. In an infant carrying the novel SCN2A mutation c.643G > A (p.Ala215Thr) only in the neonatal transcript, seizures started immediately after birth. The clinical presentation evolved from a burst-suppression pattern with 30-50 tonic seizures per day to hypsarrhythmia. The first exome analysis, focusing only on common transcripts, missed the diagnosis and delayed early therapy. A reevaluation including all transcripts revealed the SCN2A variant.
Collapse
Affiliation(s)
- Anja Penkl
- Department of Pediatrics, University Hospital of Münster, Münster, Germany
| | - Janine Reunert
- Department of Pediatrics, University Hospital of Münster, Münster, Germany
| | - Otfried M Debus
- Department of Pediatrics, Clemenshospital Münster, Münster, Germany
| | - Anna Homann
- Department of Neurology, Hospital Ludmillenstift, Meppen, Germany
| | - Ulrike Och
- Department of Pediatrics, University Hospital of Münster, Münster, Germany
| | - Stephan Rust
- Department of Pediatrics, University Hospital of Münster, Münster, Germany
| | - Thorsten Marquardt
- Department of Pediatrics, University Hospital of Münster, Münster, Germany
| |
Collapse
|
10
|
Liu Y, Wu X, Wang Y. An integrated approach for copy number variation discovery in parent-offspring trios. Brief Bioinform 2021; 22:6306464. [PMID: 34151932 DOI: 10.1093/bib/bbab230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 04/27/2021] [Accepted: 05/25/2021] [Indexed: 11/14/2022] Open
Abstract
Whole-genome sequencing (WGS) of parent-offspring trios has become widely used to identify causal copy number variations (CNVs) in rare and complex diseases. Existing CNV detection approaches usually do not make effective use of Mendelian inheritance in parent-offspring trios and yield low accuracy. In this study, we propose a novel integrated approach, TrioCNV2, for jointly detecting CNVs from WGS data of the parent-offspring trio. TrioCNV2 first makes use of the read depth and discordant read pairs to infer approximate locations of CNVs and then employs the split read and local de novo assembly approaches to refine the breakpoints. We use the real WGS data of two parent-offspring trios to demonstrate TrioCNV2's performance and compare it with other CNV detection approaches. The software TrioCNV2 is implemented using a combination of Java and R and is freely available from the website at https://github.com/yongzhuang/TrioCNV2.
Collapse
Affiliation(s)
- Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaoliang Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
11
|
Systematic analysis of exonic germline and postzygotic de novo mutations in bipolar disorder. Nat Commun 2021; 12:3750. [PMID: 34145229 PMCID: PMC8213845 DOI: 10.1038/s41467-021-23453-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 04/29/2021] [Indexed: 12/30/2022] Open
Abstract
Bipolar disorder is a severe mental illness characterized by recurrent manic and depressive episodes. To better understand its genetic architecture, we analyze ultra-rare de novo mutations in 354 trios with bipolar disorder. For germline de novo mutations, we find significant enrichment of loss-of-function mutations in constrained genes (corrected-P = 0.0410) and deleterious mutations in presynaptic active zone genes (FDR = 0.0415). An analysis integrating single-cell RNA-sequencing data identifies a subset of excitatory neurons preferentially expressing the genes hit by deleterious mutations, which are also characterized by high expression of developmental disorder genes. In the analysis of postzygotic mutations, we observe significant enrichment of deleterious ones in developmental disorder genes (P = 0.00135), including the SRCAP gene mutated in two unrelated probands. These data collectively indicate the contributions of both germline and postzygotic mutations to the risk of bipolar disorder, supporting the hypothesis that postzygotic mutations of developmental disorder genes may contribute to bipolar disorder. The significance of rare and de novo variants in bipolar disorder is not well understood. Here, the authors have analyzed whole exome/genome data from trios to identify deleterious de novo variants associated with bipolar disorder.
Collapse
|
12
|
Li C, Liao C, Meng X, Chen H, Chen W, Wei B, Zhu P. Effective Analysis of Inpatient Satisfaction: The Random Forest Algorithm. Patient Prefer Adherence 2021; 15:691-703. [PMID: 33854303 PMCID: PMC8039189 DOI: 10.2147/ppa.s294402] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 03/10/2021] [Indexed: 12/17/2022] Open
Abstract
PURPOSE To identify the factors influencing inpatient satisfaction by fitting the optimal discriminant model. PATIENTS AND METHODS A cross-sectional survey of inpatient satisfaction was conducted with 3888 patients in 16 large public hospitals in Zhejiang Province. Independent variables were screened by single-factor analysis, and the importance of all variables was comprehensively evaluated. The relationship between patients' overall satisfaction and influencing factors was established, the relative risk was evaluated by marginal benefit, and the optimal model was fitted using the receiver operating characteristic curve. RESULTS Patients' overall satisfaction was 79.73%. The five most influential factors on inpatient satisfaction, in this order, were: patients' right to know, timely nursing response, satisfaction with medical staff service, integrity of medical staff, and accuracy of diagnosis. The prediction accuracy of the random forest model was higher than that of the multiple logistic regression and naive Bayesian models. CONCLUSION Inpatient satisfaction is related to healthcare quality, diagnosis, and treatment process. Rapid identification and active improvement of the factors affecting patient satisfaction can reduce public hospital operating costs and improve patient experiences and the efficiency of health resource allocation. Public hospitals should strengthen the exchange of medical information between doctors and patients, shorten waiting time, and improve the level of medical technology, service attitude, and transparency of information disclosure.
Collapse
Affiliation(s)
- Chengcheng Li
- School of Humanities and Social Sciences, Guangxi Medical University, Nanning, 530021, People’s Republic of China
| | - Conghui Liao
- School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, People’s Republic of China
| | - Xuehui Meng
- Department of Health Service Management, Humanities and Management School, Zhejiang Chinese Medical University, Hangzhou, 310000, People’s Republic of China
| | - Honghua Chen
- School of Basic Medicine, Guangxi Medical University, Nanning, 530021, People’s Republic of China
| | - Weiling Chen
- School of Basic Medicine, Guangxi Medical University, Nanning, 530021, People’s Republic of China
| | - Bo Wei
- School of Information and Management, Guangxi Medical University, Nanning, 530021, People’s Republic of China
| | - Pinghua Zhu
- School of Humanities and Social Sciences, Guangxi Medical University, Nanning, 530021, People’s Republic of China
- Correspondence: Pinghua Zhu Email
| |
Collapse
|
13
|
Liu Y, Liu J, Wang Y. Filtering de novo indels in parent-offspring trios. BMC Bioinformatics 2020; 21:547. [PMID: 33323105 PMCID: PMC7739476 DOI: 10.1186/s12859-020-03900-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 11/19/2020] [Indexed: 12/02/2022] Open
Abstract
Background Identification of de novo indels from whole genome or exome sequencing data of parent-offspring trios is a challenging task in human disease studies and clinical practices. Existing computational approaches usually yield high false positive rate. Results In this study, we developed a gradient boosting approach for filtering de novo indels obtained by any computational approaches. Through application on the real genome sequencing data, our approach showed it could significantly reduce the false positive rate of de novo indels without a significant compromise on sensitivity. Conclusions The software DNMFilter_Indel was written in a combination of Java and R and freely available from the website at https://github.com/yongzhuang/DNMFilter_Indel.
Collapse
Affiliation(s)
- Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Harbin, 150001, China
| | - Jian Liu
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Harbin, 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Harbin, 150001, China.
| |
Collapse
|
14
|
Ajayi A, Oyedele L, Owolabi H, Akinade O, Bilal M, Davila Delgado JM, Akanbi L. Deep Learning Models for Health and Safety Risk Prediction in Power Infrastructure Projects. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2020; 40:2019-2039. [PMID: 31755999 DOI: 10.1111/risa.13425] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 06/16/2019] [Accepted: 09/16/2019] [Indexed: 05/23/2023]
Abstract
Inappropriate management of health and safety (H&S) risk in power infrastructure projects can result in occupational accidents and equipment damage. Accidents at work have detrimental effects on workers, company, and the general public. Despite the availability of H&S incident data, utilizing them to mitigate accident occurrence effectively is challenging due to inherent limitations of existing data logging methods. In this study, we used a text-mining approach for retrieving meaningful terms from data and develop six deep learning (DL) models for H&S risks management in power infrastructure. The DL models include DNNclassify (risk or no risk), DNNreg1 (loss time), DNNreg2 (body injury), DNNreg3 (plant and fleet), DNNreg4 (equipment), and DNNreg5 (environment). An H&S risk database obtained from a leading UK power infrastructure construction company was used in developing the models using the H2O framework of the R language. Performances of DL models were assessed and benchmarked with existing models using test data and appropriate performance metrics. The overall accuracy of the classification model was 0.93. The average R2 value for the five regression models was 0.92, with mean absolute error between 0.91 and 0.94. The presented results, in addition to the developed user-interface module, will help practitioners obtain a better understanding of H&S challenges, minimize project costs (such as third-party insurance and equipment repairs), and offer effective strategies to mitigate H&S risk.
Collapse
Affiliation(s)
- Anuoluwapo Ajayi
- Big Data, Enterprise and Artificial Intelligence Laboratory, University of the West of England, Bristol, UK
| | - Lukumon Oyedele
- Big Data, Enterprise and Artificial Intelligence Laboratory, University of the West of England, Bristol, UK
| | - Hakeem Owolabi
- Big Data, Enterprise and Artificial Intelligence Laboratory, University of the West of England, Bristol, UK
| | - Olugbenga Akinade
- Big Data, Enterprise and Artificial Intelligence Laboratory, University of the West of England, Bristol, UK
| | - Muhammad Bilal
- Big Data, Enterprise and Artificial Intelligence Laboratory, University of the West of England, Bristol, UK
| | - Juan Manuel Davila Delgado
- Big Data, Enterprise and Artificial Intelligence Laboratory, University of the West of England, Bristol, UK
| | - Lukman Akanbi
- Big Data, Enterprise and Artificial Intelligence Laboratory, University of the West of England, Bristol, UK
| |
Collapse
|
15
|
Bhuyan MSI, Pe'er I, Rahman MS. SICaRiO: short indel call filtering with boosting. Brief Bioinform 2020; 22:5917082. [PMID: 33003198 DOI: 10.1093/bib/bbaa238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/26/2020] [Accepted: 08/27/2020] [Indexed: 11/14/2022] Open
Abstract
Despite impressive improvement in the next-generation sequencing technology, reliable detection of indels is still a difficult endeavour. Recognition of true indels is of prime importance in many applications, such as personalized health care, disease genomics and population genetics. Recently, advanced machine learning techniques have been successfully applied to classification problems with large-scale data. In this paper, we present SICaRiO, a gradient boosting classifier for the reliable detection of true indels, trained with the gold-standard dataset from 'Genome in a Bottle' (GIAB) consortium. Our filtering scheme significantly improves the performance of each variant calling pipeline used in GIAB and beyond. SICaRiO uses genomic features that can be computed from publicly available resources, i.e. it does not require sequencing pipeline-specific information (e.g. read depth). This study also sheds lights on prior genomic contexts responsible for the erroneous calling of indels made by sequencing pipelines. We have compared prediction difficulty for three categories of indels over different sequencing pipelines. We have also ranked genomic features according to their predictivity in determining false positives.
Collapse
Affiliation(s)
- Md Shariful Islam Bhuyan
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Itsik Pe'er
- Department of Computer Science, Fu Foundation School of Engineering, and the Chair at the Center for Health Analytics, Data Science Institute, Columbia University, New York, USA
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| |
Collapse
|
16
|
Almlöf JC, Nystedt S, Mechtidou A, Leonard D, Eloranta ML, Grosso G, Sjöwall C, Bengtsson AA, Jönsen A, Gunnarsson I, Svenungsson E, Rönnblom L, Sandling JK, Syvänen AC. Contributions of de novo variants to systemic lupus erythematosus. Eur J Hum Genet 2020; 29:184-193. [PMID: 32724065 PMCID: PMC7852530 DOI: 10.1038/s41431-020-0698-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 06/04/2020] [Accepted: 07/14/2020] [Indexed: 12/21/2022] Open
Abstract
By performing whole-genome sequencing in a Swedish cohort of 71 parent-offspring trios, in which the child in each family is affected by systemic lupus erythematosus (SLE, OMIM 152700), we investigated the contribution of de novo variants to risk of SLE. We found de novo single nucleotide variants (SNVs) to be significantly enriched in gene promoters in SLE patients compared with healthy controls at a level corresponding to 26 de novo promoter SNVs more in each patient than expected. We identified 12 de novo SNVs in promoter regions of genes that have been previously implicated in SLE, or that have functions that could be of relevance to SLE. Furthermore, we detected three missense de novo SNVs, five de novo insertion-deletions, and three de novo structural variants with potential to affect the expression of genes that are relevant for SLE. Based on enrichment analysis, disease-affecting de novo SNVs are expected to occur in one-third of SLE patients. This study shows that de novo variants in promoters commonly contribute to the genetic risk of SLE. The fact that de novo SNVs in SLE were enriched to promoter regions highlights the importance of using whole-genome sequencing for identification of de novo variants.
Collapse
Affiliation(s)
- Jonas Carlsson Almlöf
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden.
| | - Sara Nystedt
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| | - Aikaterini Mechtidou
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| | - Dag Leonard
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Maija-Leena Eloranta
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Giorgia Grosso
- Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Christopher Sjöwall
- Department of Clinical and Experimental Medicine, Rheumatology/Division of Neuro and Inflammation Sciences, Linköping University, 581 83, Linköping, Sweden
| | - Anders A Bengtsson
- Department of Clinical Sciences, Rheumatology, Lund University, Skåne University Hospital, 222 42, Lund, Sweden
| | - Andreas Jönsen
- Department of Clinical Sciences, Rheumatology, Lund University, Skåne University Hospital, 222 42, Lund, Sweden
| | - Iva Gunnarsson
- Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Elisabet Svenungsson
- Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Lars Rönnblom
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Johanna K Sandling
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Ann-Christine Syvänen
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| |
Collapse
|
17
|
Paragh G, Harangi M, Karányi Z, Daróczy B, Németh Á, Fülöp P. Identifying patients with familial hypercholesterolemia using data mining methods in the Northern Great Plain region of Hungary. Atherosclerosis 2019; 277:262-266. [PMID: 30270056 DOI: 10.1016/j.atherosclerosis.2018.05.039] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 05/04/2018] [Accepted: 05/22/2018] [Indexed: 12/20/2022]
Abstract
BACKGROUND AND AIMS Familial hypercholesterolemia (FH) is one of the most frequent diseases with monogenic inheritance. Previous data indicated that the heterozygous form occurred in 1:250 people. Based on these reports, around 36,000-40,000 people are estimated to have FH in Hungary, however, there are no exact data about the frequency of the disease in our country. Therefore, we initiated a cooperation with a clinical site partner company that provides modern data mining methods, on the basis of medical and statistical records, and we applied them to two major hospitals in the Northern Great Plain region of Hungary to find patients with a possible diagnosis of FH. METHODS Medical records of 1,342,124 patients were included in our study. From the mined data, we calculated Dutch Lipid Clinic Network (DLCN) scores for each patient and grouped them according to the criteria to assess the likelihood of the diagnosis of FH. We also calculated the mean lipid levels before the diagnosis and treatment. RESULTS We identified 225 patients with a DLCN score of 6-8 (mean total cholesterol: 9.38 ± 3.0 mmol/L, mean LDL-C: 7.61 ± 2.4 mmol/L), and 11,706 patients with a DLCN score of 3-5 (mean total cholesterol: 7.34 ± 1.2 mmol/L, mean LDL-C: 5.26 ± 0.8 mmol/L). CONCLUSIONS The analysis of more regional and country-wide data and more frequent measurements of total cholesterol and LDL-C levels would increase the number of FH cases discovered. Data mining seems to be ideal for filtering and screening of FH in Hungary.
Collapse
Affiliation(s)
- György Paragh
- Department of Internal Medicine, University of Debrecen Faculty of Medicine, Debrecen, Hungary.
| | - Mariann Harangi
- Department of Internal Medicine, University of Debrecen Faculty of Medicine, Debrecen, Hungary
| | - Zsolt Karányi
- Department of Internal Medicine, University of Debrecen Faculty of Medicine, Debrecen, Hungary
| | - Bálint Daróczy
- Institute for Computer Science and Control, Hungarian Academy of Sciences, (MTA SZTAKI), Budapest, Hungary
| | - Ákos Németh
- Aesculab Medical Solutions, Black Horse Group Ltd., Debrecen, Hungary
| | - Péter Fülöp
- Department of Internal Medicine, University of Debrecen Faculty of Medicine, Debrecen, Hungary
| |
Collapse
|
18
|
Feliciano P, Zhou X, Astrovskaya I, Turner TN, Wang T, Brueggeman L, Barnard R, Hsieh A, Snyder LG, Muzny DM, Sabo A, Gibbs RA, Eichler EE, O’Roak BJ, Michaelson JJ, Volfovsky N, Shen Y, Chung WK. Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes. NPJ Genom Med 2019; 4:19. [PMID: 31452935 PMCID: PMC6707204 DOI: 10.1038/s41525-019-0093-8] [Citation(s) in RCA: 153] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/11/2019] [Indexed: 12/30/2022] Open
Abstract
Autism spectrum disorder (ASD) is a genetically heterogeneous condition, caused by a combination of rare de novo and inherited variants as well as common variants in at least several hundred genes. However, significantly larger sample sizes are needed to identify the complete set of genetic risk factors. We conducted a pilot study for SPARK (SPARKForAutism.org) of 457 families with ASD, all consented online. Whole exome sequencing (WES) and genotyping data were generated for each family using DNA from saliva. We identified variants in genes and loci that are clinically recognized causes or significant contributors to ASD in 10.4% of families without previous genetic findings. In addition, we identified variants that are possibly associated with ASD in an additional 3.4% of families. A meta-analysis using the TADA framework at a false discovery rate (FDR) of 0.1 provides statistical support for 26 ASD risk genes. While most of these genes are already known ASD risk genes, BRSK2 has the strongest statistical support and reaches genome-wide significance as a risk gene for ASD (p-value = 2.3e-06). Future studies leveraging the thousands of individuals with ASD who have enrolled in SPARK are likely to further clarify the genetic risk factors associated with ASD as well as allow accelerate ASD research that incorporates genetic etiology.
Collapse
Affiliation(s)
| | - Xueya Zhou
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | | | - Tychele N. Turner
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
| | - Tianyun Wang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
| | - Leo Brueggeman
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA 52242 USA
| | - Rebecca Barnard
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239 USA
| | - Alexander Hsieh
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | | | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195 USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195 USA
| | - Brian J. O’Roak
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239 USA
| | - Jacob J. Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA 52242 USA
| | | | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY 10032 USA
| | - Wendy K. Chung
- Simons Foundation, New York, NY 10010 USA
- Department of Pediatrics, Columbia University Medical Center, New York, NY 10032 USA
| |
Collapse
|
19
|
Zhou J, Park CY, Theesfeld CL, Wong AK, Yuan Y, Scheckel C, Fak JJ, Funk J, Yao K, Tajima Y, Packer A, Darnell RB, Troyanskaya OG. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat Genet 2019; 51:973-980. [PMID: 31133750 PMCID: PMC6758908 DOI: 10.1038/s41588-019-0420-0] [Citation(s) in RCA: 174] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 04/12/2019] [Indexed: 12/19/2022]
Abstract
We address the challenge of detecting the contribution of noncoding mutations to disease with a deep-learning-based framework that predicts the specific regulatory effects and the deleterious impact of genetic variants. Applying this framework to 1,790 autism spectrum disorder (ASD) simplex families reveals a role in disease for noncoding mutations-ASD probands harbor both transcriptional- and post-transcriptional-regulation-disrupting de novo mutations of significantly higher functional impact than those in unaffected siblings. Further analysis suggests involvement of noncoding mutations in synaptic transmission and neuronal development and, taken together with previous studies, reveals a convergent genetic landscape of coding and noncoding mutations in ASD. We demonstrate that sequences carrying prioritized mutations identified in probands possess allele-specific regulatory activity, and we highlight a link between noncoding mutations and heterogeneity in the IQ of ASD probands. Our predictive genomics framework illuminates the role of noncoding mutations in ASD and prioritizes mutations with high impact for further study, and is broadly applicable to complex human diseases.
Collapse
Affiliation(s)
- Jian Zhou
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, NJ, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Christopher Y Park
- Flatiron Institute, Simons Foundation, New York, NY, USA
- Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| | - Chandra L Theesfeld
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Aaron K Wong
- Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Yuan Yuan
- Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
- Gene Therapy Program, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Claudia Scheckel
- Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
- Institute of Neuropathology, University of Zurich, Zurich, Switzerland
| | - John J Fak
- Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| | - Julien Funk
- Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Kevin Yao
- Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Yoko Tajima
- Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| | | | - Robert B Darnell
- Laboratory of Molecular Neuro-Oncology and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA.
| | - Olga G Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
- Flatiron Institute, Simons Foundation, New York, NY, USA.
- Department of Computer Science, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
20
|
Ren Y, Feng X, Xia X, Zhang Y, Zhang W, Su J, Wang Z, Xu Y, Zhou F. Gender specificity improves the early-stage detection of clear cell renal cell carcinoma based on methylomic biomarkers. Biomark Med 2018; 12:607-618. [PMID: 29707986 DOI: 10.2217/bmm-2018-0084] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
AIM The two genders are different ranging from the molecular to the phenotypic levels. But most studies did not use this important information. We hypothesize that the integration of gender information may improve the overall prediction accuracy. MATERIALS & METHODS A comprehensive comparative study was carried out to test the hypothesis. The classification of the stages I + II versus III + IV of the clear cell renal cell carcinoma samples was formulated as an example. RESULTS & CONCLUSION In most cases, female-specific model significantly outperformed both-gender model, as similarly for the male-specific model. Our data suggested that gender information is essential for building biomedical classification models and even a simple strategy of building two gender-specific models may outperform the gender-mixed model.
Collapse
Affiliation(s)
- Yanjiao Ren
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.,College of Information Technology, Jilin Agricultural University, Changchun, Jilin 130118, China
| | - Xin Feng
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Xin Xia
- College of Software, Jilin University, Changchun, Jilin 130012, China
| | - Yexian Zhang
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Wenniu Zhang
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Jing Su
- Department of Pathophysiology, College of Basic Medical Sciences, Jilin University, Changchun, Jilin 130021, China
| | - Zhongyu Wang
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Ying Xu
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.,Computational Systems Biology Lab, Department of Biochemistry & Molecular Biology, University of Georgia, Athens, Georgia, 30602, USA.,College of Public Health, Jilin University, Changchun, Jilin 130012, China
| | - Fengfeng Zhou
- College of Computer Science & Technology, Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| |
Collapse
|
21
|
Xu C, Liu J, Yang W, Shu Y, Wei Z, Zheng W, Feng X, Zhou F. An OMIC biomarker detection algorithm TriVote and its application in methylomic biomarker detection. Epigenomics 2018; 10:335-347. [DOI: 10.2217/epi-2017-0097] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Aim: Transcriptomic and methylomic patterns represent two major OMIC data sources impacted by both inheritable genetic information and environmental factors, and have been widely used as disease diagnosis and prognosis biomarkers. Materials & methods: Modern transcriptomic and methylomic profiling technologies detect the status of tens of thousands or even millions of probing residues in the human genome, and introduce a major computational challenge for the existing feature selection algorithms. This study proposes a three-step feature selection algorithm, TriVote, to detect a subset of transcriptomic or methylomic residues with highly accurate binary classification performance. Results & conclusion: TriVote outperforms both filter and wrapper feature selection algorithms with both higher classification accuracy and smaller feature number on 17 transcriptomes and two methylomes. Biological functions of the methylome biomarkers detected by TriVote were discussed for their disease associations. An easy-to-use Python package is also released to facilitate the further applications.
Collapse
Affiliation(s)
- Cheng Xu
- College of Software, Jilin University, Changchun, Jilin 130012, PR China
| | - Jiamei Liu
- College of Software, Jilin University, Changchun, Jilin 130012, PR China
| | - Weifeng Yang
- College of Software, Jilin University, Changchun, Jilin 130012, PR China
| | - Yayun Shu
- College of Software, Jilin University, Changchun, Jilin 130012, PR China
| | - Zhipeng Wei
- Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, College of Computer Science & Technology, Jilin University, Changchun, Jilin 130012, PR China
| | - Weiwei Zheng
- Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, College of Computer Science & Technology, Jilin University, Changchun, Jilin 130012, PR China
| | - Xin Feng
- Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, College of Computer Science & Technology, Jilin University, Changchun, Jilin 130012, PR China
| | - Fengfeng Zhou
- College of Software, Jilin University, Changchun, Jilin 130012, PR China
- Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, College of Computer Science & Technology, Jilin University, Changchun, Jilin 130012, PR China
| |
Collapse
|
22
|
Garcia-Rosa S, de Amorim MG, Valieris R, Marques VD, Lorenzi JCC, Toller VB, do Olival GS, da Silva Júnior WA, da Silva IT, Barreira AA, Nunes DN, Dias-Neto E. Exome sequencing of multiple-sclerosis patients and their unaffected first-degree relatives. BMC Res Notes 2017; 10:735. [PMID: 29233175 PMCID: PMC5727932 DOI: 10.1186/s13104-017-3072-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2017] [Accepted: 12/06/2017] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVES The understanding of complex multifactorial diseases requires the availability of a variety of data for a large-number of affected individuals. In this data note here we provide whole exome sequencing data from a set of non-familiar multiple-sclerosis (MS) patients as well as their unaffected first-degree relatives. This data might help the identification of genomic alterations, including single nucleotide polymorphisms, de novo variations and structural genomic variations, such as copy-number alterations that may impact this disease. DATA DESCRIPTION This dataset comprises the full exome of 28 Brazilian subjects grouped in eight distinct families, consisting of four complete trios (mother-patient-father) plus another four complete trios with one added unaffected sibling. In total, we present the full exome data of eight patients diagnosed with recurrent remittent multiple sclerosis. Diagnoses were made by experienced neurologists and all enrolled patients had at least 5 years of follow up and specific MS treatment. Exomes were sequenced from leukocyte-derived DNA, after the capture of exons using biotinylated probes, in the Ion Proton platform. For each exome we generated an average of 66.1 million good quality mapped reads with an average length of ~ 160nt. On average, for 90% of the exome a vertical coverage above 20× was reached.
Collapse
Affiliation(s)
- Sheila Garcia-Rosa
- Lab. of Medical Genomics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Maria Galli de Amorim
- Lab. of Medical Genomics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Renan Valieris
- Laboratory of Computational Biology and Bioinformatics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Vanessa Daccach Marques
- Department of Neurosciences, Clinical Neuroimmunology Division, Medical School and Hospital das Clínicas of Ribeirão Preto, University of São Paulo (USP), Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
- Center for Medical Genomics, HCFMRP/USP, Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Julio Cesar Cetrulo Lorenzi
- Center for Medical Genomics, HCFMRP/USP, Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo (USP), Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Vania Balardin Toller
- Neurosciences Research Group, Faculdade de Ciências Médicas da Santa Casa de São Paulo, Rua Doutor Cesário Motta Júnior, 61 - Vila Buarque, São Paulo, SP 01221-020 Brazil
| | - Guilherme Sciascia do Olival
- Neurosciences Research Group, Faculdade de Ciências Médicas da Santa Casa de São Paulo, Rua Doutor Cesário Motta Júnior, 61 - Vila Buarque, São Paulo, SP 01221-020 Brazil
| | - Wilson Araújo da Silva Júnior
- Center for Medical Genomics, HCFMRP/USP, Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo (USP), Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Israel Tojal da Silva
- Laboratory of Computational Biology and Bioinformatics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Amilton Antunes Barreira
- Department of Neurosciences, Clinical Neuroimmunology Division, Medical School and Hospital das Clínicas of Ribeirão Preto, University of São Paulo (USP), Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
- Center for Medical Genomics, HCFMRP/USP, Avenida Bandeirantes, 3900, Ribeirão Preto, SP 14049-900 Brazil
| | - Diana Noronha Nunes
- Lab. of Medical Genomics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
| | - Emmanuel Dias-Neto
- Lab. of Medical Genomics, International Research Center, A.C.Camargo Cancer Center, Rua Taguá 440, 1st Floor, São Paulo, SP 01508-010 Brazil
- Lab. of Neurosciences (LIM-27), Institute of Psychiatry, Faculdade de Medicina, Universidade de São Paulo, São Paulo, SP Brazil
| |
Collapse
|
23
|
Jin ZB, Li Z, Liu Z, Jiang Y, Cai XB, Wu J. Identification of de novo germline mutations and causal genes for sporadic diseases using trio-based whole-exome/genome sequencing. Biol Rev Camb Philos Soc 2017; 93:1014-1031. [PMID: 29154454 DOI: 10.1111/brv.12383] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 09/28/2017] [Accepted: 10/10/2017] [Indexed: 12/14/2022]
Abstract
Whole-genome or whole-exome sequencing (WGS/WES) of the affected proband together with normal parents (trio) is commonly adopted to identify de novo germline mutations (DNMs) underlying sporadic cases of various genetic disorders. However, our current knowledge of the occurrence and functional effects of DNMs remains limited and accurately identifying the disease-causing DNM from a group of irrelevant DNMs is complicated. Herein, we provide a general-purpose discussion of important issues related to pathogenic gene identification based on trio-based WGS/WES data. Specifically, the relevance of DNMs to human sporadic diseases, current knowledge of DNM biogenesis mechanisms, and common strategies or software tools used for DNM detection are reviewed, followed by a discussion of pathogenic gene prioritization. In addition, several key factors that may affect DNM identification accuracy and causal gene prioritization are reviewed. Based on recent major advances, this review both sheds light on how trio-based WGS/WES technologies can play a significant role in the identification of DNMs and causal genes for sporadic diseases, and also discusses existing challenges.
Collapse
Affiliation(s)
- Zi-Bing Jin
- Division of Ophthalmic Genetics, The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China.,State Key Laboratory of Ophthalmology Optometry and Vision Science, Wenzhou Medical University, Wenzhou, 325027, China
| | - Zhongshan Li
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Zhenwei Liu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Yi Jiang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Xue-Bi Cai
- Division of Ophthalmic Genetics, The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China.,State Key Laboratory of Ophthalmology Optometry and Vision Science, Wenzhou Medical University, Wenzhou, 325027, China
| | - Jinyu Wu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| |
Collapse
|
24
|
Tan R, Wang J, Wu X, Juan L, Zheng L, Ma R, Zhan Q, Wang T, Jin S, Jiang Q, Wang Y. ERDS-exome: a Hybrid Approach for Copy Number Variant Detection from Whole-exome Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 17:796-803. [PMID: 28981421 DOI: 10.1109/tcbb.2017.2758779] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Copy number variants (CNVs) play important roles in human disease and evolution. With the rapid development of next-generation sequencing technologies, many tools have been developed for inferring CNVs based on whole-exome sequencing (WES) data. However, as a result of the sparse distribution of exons in the genome, the limitations of the WES technique, and the nature of high-level signal noises in WES data, the efficacy of these variants remains less than desirable. Thus, there is need for the development of an effective tool to achieve a considerable power in WES CNVs discovery. In the present study, we describe a novel method, Estimation by Read Depth (RD) with Single-nucleotide variants from exome sequencing data (ERDS-exome). ERDS-exome employs a hybrid normalization approach to normalize WES data and to incorporate RD and single-nucleotide variation information together as a hybrid signal into a paired hidden Markov model to infer CNVs from WES data. Based on systematic evaluations of real data from the 1000 Genomes Project using other state-of-the-art tools, we observed that ERDS-exome demonstrates higher sensitivity and provides comparable or even better specificity than other tools. ERDS-exome is publicly available at: https://erds-exome.github.io.
Collapse
|
25
|
Ge R, Zhou M, Luo Y, Meng Q, Mai G, Ma D, Wang G, Zhou F. McTwo: a two-step feature selection algorithm based on maximal information coefficient. BMC Bioinformatics 2016; 17:142. [PMID: 27006077 PMCID: PMC4804474 DOI: 10.1186/s12859-016-0990-0] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 03/14/2016] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. RESULTS This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. CONCLUSION McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.
Collapse
Affiliation(s)
- Ruiquan Ge
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P.R. China
| | - Manli Zhou
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P.R. China
| | - Youxi Luo
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
- School of Science, Hubei University of Technology, Wuhan, Hubei, 430068, P.R. China
| | - Qinghan Meng
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P.R. China
| | - Guoqin Mai
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China
| | - Dongli Ma
- Shenzhen Children's Hospital, Shenzhen, Guangdong, 518026, P.R. China.
| | - Guoqing Wang
- Department of Pathogenobiology, Basic Medical College of Jilin University, Changchun, Jilin, China.
| | - Fengfeng Zhou
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, 518055, P.R. China.
| |
Collapse
|
26
|
Turner T, Hormozdiari F, Duyzend M, McClymont S, Hook P, Iossifov I, Raja A, Baker C, Hoekzema K, Stessman H, Zody M, Nelson B, Huddleston J, Sandstrom R, Smith J, Hanna D, Swanson J, Faustman E, Bamshad M, Stamatoyannopoulos J, Nickerson D, McCallion A, Darnell R, Eichler E. Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA. Am J Hum Genet 2016; 98:58-74. [PMID: 26749308 DOI: 10.1016/j.ajhg.2015.11.023] [Citation(s) in RCA: 209] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 11/25/2015] [Indexed: 12/17/2022] Open
Abstract
We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.
Collapse
|
27
|
Liu Y, Liu J, Lu J, Peng J, Juan L, Zhu X, Li B, Wang Y. Joint detection of copy number variations in parent-offspring trios. Bioinformatics 2015; 32:1130-7. [PMID: 26644415 DOI: 10.1093/bioinformatics/btv707] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 11/27/2015] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Whole genome sequencing (WGS) of parent-offspring trios is a powerful approach for identifying disease-associated genes via detecting copy number variations (CNVs). Existing approaches, which detect CNVs for each individual in a trio independently, usually yield low-detection accuracy. Joint modeling approaches leveraging Mendelian transmission within the parent-offspring trio can be an efficient strategy to improve CNV detection accuracy. RESULTS In this study, we developed TrioCNV, a novel approach for jointly detecting CNVs in parent-offspring trios from WGS data. Using negative binomial regression, we modeled the read depth signal while considering both GC content bias and mappability bias. Moreover, we incorporated the family relationship and used a hidden Markov model to jointly infer CNVs for three samples of a parent-offspring trio. Through application to both simulated data and a trio from 1000 Genomes Project, we showed that TrioCNV achieved superior performance than existing approaches. AVAILABILITY AND IMPLEMENTATION The software TrioCNV implemented using a combination of Java and R is freely available from the website at https://github.com/yongzhuang/TrioCNV CONTACT: ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jian Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jianguo Lu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jiajie Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Liran Juan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaolin Zhu
- Institute for Genomic Medicine, Columbia University, New York, NY 10032, University Program in Genetics and Genomics, Duke University Medical School, Durham, NC 27708
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235 and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
28
|
Li J, Jiang Y, Wang T, Chen H, Xie Q, Shao Q, Ran X, Xia K, Sun ZS, Wu J. mirTrios: an integrated pipeline for detection of de novo and rare inherited mutations from trios-based next-generation sequencing. J Med Genet 2015; 52:275-81. [PMID: 25596308 DOI: 10.1136/jmedgenet-2014-102656] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
OBJECTIVES Recently, several studies documented that de novo mutations (DNMs) play important roles in the aetiology of sporadic diseases. Next-generation sequencing (NGS) enables variant calling at single-base resolution on a genome-wide scale. However, accurate identification of DNMs from NGS data still remains a major challenge. We developed mirTrios, a web server, to accurately detect DNMs and rare inherited mutations from NGS data in sporadic diseases. METHODS The expectation-maximisation (EM) model was adopted to accurately identify DNMs from variant call files of a trio generated by GATK (Genome Analysis Toolkit). The GATK results, which contain certain basic properties (such as PL, PRT and PART), are iteratively integrated into the EM model to strike a threshold for DNMs detection. Training sets of true and false positive DNMs in the EM model were built from whole genome sequencing data of 64 trios. RESULTS With our in-house whole exome sequencing datasets from 20 trios, mirTrios totally identified 27 DNMs in the coding region, 25 of which (92.6%) are validated as true positives. In addition, to facilitate the interpretation of diverse mutations, mirTrios can also be employed in the identification of rare inherited mutations. Embedded with abundant annotation of DNMs and rare inherited mutations, mirTrios also supports known diagnostic variants and causative gene identification, as well as the prioritisation of novel and promising candidate genes. CONCLUSIONS mirTrios provides an intuitive interface for the general geneticist and clinician, and can be widely used for detection of DNMs and rare inherited mutations, and annotation in sporadic diseases. mirTrios is freely available at http://centre.bioinformatics.zj.cn/mirTrios/.
Collapse
Affiliation(s)
- Jinchen Li
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Yi Jiang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Tao Wang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Huiqian Chen
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Qing Xie
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Qianzhi Shao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Xia Ran
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Kun Xia
- State Key Laboratory of Medical Genetics, Central South University, Changsha, China
| | - Zhong Sheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Jinyu Wu
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
29
|
Wei Q, Zhan X, Zhong X, Liu Y, Han Y, Chen W, Li B. A Bayesian framework for de novo mutation calling in parents-offspring trios. ACTA ACUST UNITED AC 2014; 31:1375-81. [PMID: 25535243 DOI: 10.1093/bioinformatics/btu839] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 12/15/2014] [Indexed: 12/30/2022]
Abstract
MOTIVATION Spontaneous (de novo) mutations play an important role in the disease etiology of a range of complex diseases. Identifying de novo mutations (DNMs) in sporadic cases provides an effective strategy to find genes or genomic regions implicated in the genetics of disease. High-throughput next-generation sequencing enables genome- or exome-wide detection of DNMs by sequencing parents-proband trios. It is challenging to sift true mutations through massive amount of noise due to sequencing error and alignment artifacts. One of the critical limitations of existing methods is that for all genomic regions the same pre-specified mutation rate is assumed, which has a significant impact on the DNM calling accuracy. RESULTS In this study, we developed and implemented a novel Bayesian framework for DNM calling in trios (TrioDeNovo), which overcomes these limitations by disentangling prior mutation rates from evaluation of the likelihood of the data so that flexible priors can be adjusted post-hoc at different genomic sites. Through extensively simulations and application to real data we showed that this new method has improved sensitivity and specificity over existing methods, and provides a flexible framework to further improve the efficiency by incorporating proper priors. The accuracy is further improved using effective filtering based on sequence alignment characteristics. AVAILABILITY AND IMPLEMENTATION The C++ source code implementing TrioDeNovo is freely available at https://medschool.vanderbilt.edu/cgg. CONTACT bingshan.li@vanderbilt.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Wei
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xiaowei Zhan
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xue Zhong
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Yongzhuang Liu
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Yujun Han
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|