1
|
Rao J, Luo H, An D, Liang X, Peng L, Chen F. Performance evaluation of structural variation detection using DNBSEQ whole-genome sequencing. BMC Genomics 2025; 26:299. [PMID: 40133825 PMCID: PMC11938577 DOI: 10.1186/s12864-025-11494-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Accepted: 03/17/2025] [Indexed: 03/27/2025] Open
Abstract
BACKGROUND DNBSEQ platforms have been widely used for variation detection, including single-nucleotide variants (SNVs) and short insertions and deletions (INDELs), which is comparable to Illumina. However, the performance and even characteristics of structural variations (SVs) detection using DNBSEQ platforms are still unclear. RESULTS In this study, we assessed the detection of SVs using 40 tools on eight DNBSEQ whole-genome sequencing (WGS) datasets and two Illumina WGS datasets of NA12878. Our findings confirmed that the performance of SVs detection using the same tool on DNBSEQ and Illumina datasets was highly consistent, with correlations greater than 0.80 on metrics of number, size, precision and sensitivity, respectively. Furthermore, we constructed a "DNBSEQ" SV set (4,785 SVs) from the DNBSEQ datasets and an "Illumina" SV set (6,797 SVs) from the Illumina datasets. We found that these two SV sets were highly consistent of SV sites and genomic characteristics, including repetitive regions, GC distribution, difficult-to-sequence regions, and gene features, indicating the robustness of our comparative analysis and highlights the value of both platforms in understanding the genomic context of SVs. CONCLUSIONS Our study systematically analyzed and characterized germline SVs detected on WGS datasets sequenced from DNBSEQ platforms, providing a benchmark resource for further studies of SVs using DNBSEQ platforms.
Collapse
Affiliation(s)
- Junhua Rao
- MGI Tech, Shenzhen, 518083, China
- BGI, Shenzhen, 518083, China
| | | | - Dan An
- MGI Tech, Shenzhen, 518083, China
- BGI, Shenzhen, 518083, China
| | - Xinming Liang
- MGI Tech, Shenzhen, 518083, China
- BGI, Shenzhen, 518083, China
| | | | - Fang Chen
- MGI Tech, Shenzhen, 518083, China.
- BGI, Shenzhen, 518083, China.
| |
Collapse
|
2
|
Chen X, Wei S, Sun C, Yi Z, Wang Z, Wu Y, Xu J, Tao J, Chen H, Zhang M, Jiang Y, Lv H, Huang C. Computational Tools for Studying Genome Structural Variation. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2025; 29:36-48. [PMID: 39905890 DOI: 10.1089/omi.2024.0200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2025]
Abstract
Structural variation (SV) typically refers to alterations in DNA fragments at least 50 base pairs long in the human genome. It can alter thousands of DNA nucleotides and thus significantly influence human health, disease, and clinical phenotypes. There is a shared and growing recognition that the emergence of effective computational tools and high-throughput technologies such as short-read sequencing and long-read sequencing offers novel insight into SV and, by extension, diseases affecting planetary health. However, numerous available SV tools exist with varying strengths and weaknesses. This is currently hampering the abilities of scholars to select the optimal tools to study SVs. Here, we reviewed 175 tools developed in the past two decades for SV detection, annotation, visualization, and downstream analysis of human genomics. In this expert review, we provide a comprehensive catalog of SV-related tools across different technology platforms and summarize their features, strengths, and limitations with an eye to accelerate systems science and planetary health innovations.
Collapse
Affiliation(s)
- Xingyu Chen
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Siyu Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zelin Yi
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Zihan Wang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Yingyi Wu
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Jing Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Junxian Tao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Haiyan Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Mingming Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Huang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| |
Collapse
|
3
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
4
|
Su R, Zhou H, Yang W, Moqir S, Ritu X, Liu L, Shi Y, Dong A, Bayier M, Letu Y, Manxi X, Chulu H, Nasenochir N, Meng H, Herrid M. Near telomere-to-telomere genome assembly of Mongolian cattle: implications for population genetic variation and beef quality. Gigascience 2024; 13:giae099. [PMID: 39693631 DOI: 10.1093/gigascience/giae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 09/29/2024] [Accepted: 11/10/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Mongolian cattle, a unique breed indigenous to China, represent valuable genetic resources and serve as important sources of meat and milk. However, there is a lack of high-quality genomes in cattle, which limits biological research and breeding improvement. FINDINGS In this study, we conducted whole-genome sequencing on a Mongolian bull. This effort yielded a 3.1 Gb Mongolian cattle genome sequence, with a BUSCO integrity assessment of 95.9%. The assembly achieved both contig N50 and scaffold N50 values of 110.9 Mb, with only 3 gaps identified across the entire genome. Additionally, we successfully assembled the Y chromosome among the 31 chromosomes. Notably, 3 chromosomes were identified as having telomeres at both ends. The annotation data include 54.31% repetitive sequences and 29,794 coding genes. Furthermore, a population genetic variation analysis was conducted on 332 individuals from 56 breeds, through which we identified variant loci and potentially discovered genes associated with the formation of marbling patterns in beef, predominantly located on chromosome 12. CONCLUSIONS This study produced a genome with high continuity, completeness, and accuracy, marking the first assembly and annotation of a near telomere-to-telomere genome in cattle. Based on this, we generated a variant database comprising 332 individuals. The assembly of the genome and the analysis of population variants provide significant insights into cattle evolution and enhance our understanding of breeding selection.
Collapse
Affiliation(s)
- Rina Su
- Grassland & Cattle Investment Co., Ltd., R&D Center, Hohhot 010000, Inner Mongolia
| | - Hao Zhou
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Wenhao Yang
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Sorgog Moqir
- Grassland & Cattle Investment Co., Ltd., R&D Center, Hohhot 010000, Inner Mongolia
| | - Xiji Ritu
- Grassland & Cattle Investment Co., Ltd., R&D Center, Hohhot 010000, Inner Mongolia
| | - Lei Liu
- Grassland & Cattle Investment Co., Ltd., R&D Center, Hohhot 010000, Inner Mongolia
| | - Ying Shi
- Grassland & Cattle Investment Co., Ltd., R&D Center, Hohhot 010000, Inner Mongolia
| | - Ai Dong
- Bureau of Agriculture and Animal Husbandry, Alxa League, Bayanhot 750306, Inner Mongolia, China
| | - Menghe Bayier
- Centre for Animal Husbandry and Veterinary Technology, Alxa League, Bayanhot 750306, Inner Mongolia
| | - Yibu Letu
- Station for Animal Husbandry, Xilingol League, Xilinhot 026000, Inner Mongolia
| | - Xin Manxi
- Station for Animal Husbandry, Xilingol League, Xilinhot 026000, Inner Mongolia
| | - Hasi Chulu
- Station for Animal Husbandry, Sunit Left Banner, Xilingol League, Xilinhot 026000, Inner Mongolia
| | - Narenhua Nasenochir
- College of Animal Science, Inner Mongolia Agriculture University, Hohhot 010000, Inner Mongolia, China
| | - He Meng
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Muren Herrid
- Grassland & Cattle Investment Co., Ltd., R&D Center, Hohhot 010000, Inner Mongolia
- International Livestock Research Centre, Gold Coast 4211, Queensland, Australia
| |
Collapse
|
5
|
Meng X, Wang M, Luo M, Sun L, Yan Q, Liu Y. Systematic evaluation of multiple NGS platforms for structural variants detection. J Biol Chem 2023; 299:105436. [PMID: 37944616 PMCID: PMC10724692 DOI: 10.1016/j.jbc.2023.105436] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/29/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023] Open
Abstract
Structural variations (SV) are critical genome changes affecting human diseases. Although many hybridization-based methods exist, evaluating SVs through next-generation sequencing (NGS) data is still necessary for broader research exploration. Here, we comprehensively compared the performance of 16 SV callers and multiple NGS platforms using NA12878 whole genome sequencing (WGS) datasets. The results indicated that several SV callers performed well relatively, such as Manta, GRIDSS, LUMPY, TARDIS, FermiKit, and Wham. Meanwhile, all NGS platforms have a similar performance using a single software. Additionally, we found that the source of undetected SVs was mostly from long reads datasets, therefore, the more appropriate strategy for accurate SV detection will be an integration of long and shorter reads in the future. At present, in the period of NGS as a mainstream method in bioinformatics, our study would provide helpful and comprehensive guidelines for specific categories of SV research.
Collapse
Affiliation(s)
- Xuan Meng
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Miao Wang
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Mingjie Luo
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Lei Sun
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Qin Yan
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China
| | - Yongfeng Liu
- Research Cooperation Department, GeneMind Biosciences Company Limited, Shenzhen, China.
| |
Collapse
|
6
|
Wang S, Wang M, Chen L, Pan G, Wang Y, Li SC. SpecHLA enables full-resolution HLA typing from sequencing data. CELL REPORTS METHODS 2023; 3:100589. [PMID: 37714157 PMCID: PMC10545945 DOI: 10.1016/j.crmeth.2023.100589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 06/20/2023] [Accepted: 08/21/2023] [Indexed: 09/17/2023]
Abstract
Reconstructing diploid sequences of human leukocyte antigen (HLA) genes, i.e., full-resolution HLA typing, from sequencing data is challenging. The high homogeneity across HLA genes and the high heterogeneity within HLA alleles complicate the identification of genomic source loci for sequencing reads. Here, we present SpecHLA, which utilizes fine-tuned reads binning and local assembly to achieve accurate full-resolution HLA typing. SpecHLA accepts sequencing data from paired-end, 10×-linked-reads, high-throughput chromosome conformation capture (Hi-C), Pacific Biosciences (PacBio), and Oxford Nanopore Technology (ONT). It can also incorporate pedigree data and genotype frequency to refine typing. In 32 Human Genome Structural Variation Consortium, Phase 2 (HGSVC2) samples, SpecHLA achieved 98.6% accuracy for G-group-resolution HLA typing, inferring entire HLA alleles with an average of three mismatches fewer, ten gaps fewer, and 590 bp less edit distance than HISAT-genotype per allele. Additionally, SpecHLA exhibited a 2-field typing accuracy of 98.6% in 875 real samples. Finally, SpecHLA detected HLA loss of heterozygosity with 99.7% specificity and 96.8% sensitivity in simulated samples of cancer cell lines.
Collapse
Affiliation(s)
- Shuai Wang
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Mengyao Wang
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Lingxi Chen
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Guangze Pan
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Yanfei Wang
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Shuai Cheng Li
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong.
| |
Collapse
|
7
|
Xu Y, Shi X, Wang W, Zhang L, Cheung S, Rudolph M, Brega N, Dong X, Qian L, Wang L, Yuan S, Tan DSW, Wang K. Prevalence and clinico-genomic characteristics of patients with TRK fusion cancer in China. NPJ Precis Oncol 2023; 7:75. [PMID: 37567953 PMCID: PMC10421940 DOI: 10.1038/s41698-023-00427-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 07/28/2023] [Indexed: 08/13/2023] Open
Abstract
Neurotrophic tyrosine kinase (NTRK) fusions involving NTRK1, NTRK2, and NTRK3 were found in a broad range of solid tumors as driver gene variants. However, the prevalence of NTRK fusions in Chinese solid tumor patients is rarely reported. Based on the next-generation sequencing data from 10,194 Chinese solid tumor patients, we identified approximately 0.4% (40/10,194) of Chinese solid tumor patients with NTRK fusion. NTRK fusions were most frequently detected in soft tissue sarcoma (3.0%), especially in the fibrosarcoma subtype (12.7%). A total of 29 NTRK fusion patterns were identified, of which 11 were rarely reported. NTRK fusion mostly co-occurred with TP53 (38%), CDKN2A (23%), and ACVR2A (18%) and rarely with NTRK amplification (5.0%) and single nucleotide variants (2.5%). DNA-based NTRK fusion sequencing exhibited a higher detection rate than pan-TRK immunohistochemistry (100% vs. 87.5%). Two patients with NTRK fusions showed clinical responses to larotrectinib, supporting the effective response of NTRK fusion patients to TRK inhibitors.
Collapse
Affiliation(s)
- Yujun Xu
- Department of Imaging Interventional Therapy, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University; Department of Imaging Interventional Therapy, Shandong Provincial Hospital Affiliated to Shandong First Medical University, 250021, Jinan, China
| | | | | | - Lin Zhang
- OrigiMed Co. Ltd, 201114, Shanghai, China
| | - Shinghu Cheung
- Precision Molecular Oncology, Research and Early Development - Oncology, Pharmaceuticals, Bayer U.S. LLC, Cambridge, USA
| | - Marion Rudolph
- Translational Sciences Oncology, Research and Early Development - Oncology, Pharmaceuticals, Bayer AG, Berlin, Germany
| | | | | | - Lili Qian
- OrigiMed Co. Ltd, 201114, Shanghai, China
| | - Liwei Wang
- OrigiMed Co. Ltd, 201114, Shanghai, China
| | | | - Daniel Shao Weng Tan
- National Cancer Centre Singapore, Duke-NUS Medical School, 169610, Singapore, Singapore.
| | - Kai Wang
- OrigiMed Co. Ltd, 201114, Shanghai, China.
| |
Collapse
|
8
|
Kosugi S, Kamatani Y, Harada K, Tomizuka K, Momozawa Y, Morisaki T, The BioBank Japan Project, Terao C. Detection of trait-associated structural variations using short-read sequencing. CELL GENOMICS 2023; 3:100328. [PMID: 37388916 PMCID: PMC10300613 DOI: 10.1016/j.xgen.2023.100328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 02/17/2023] [Accepted: 04/25/2023] [Indexed: 07/01/2023]
Abstract
Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba 277-8562, Japan
| | - Katsutoshi Harada
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan
| | - Takayuki Morisaki
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
| | | | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
9
|
Xue JR, Mackay-Smith A, Mouri K, Garcia MF, Dong MX, Akers JF, Noble M, Li X, Zoonomia Consortium, Lindblad-Toh K, Karlsson EK, Noonan JP, Capellini TD, Brennand KJ, Tewhey R, Sabeti PC, Reilly SK. The functional and evolutionary impacts of human-specific deletions in conserved elements. Science 2023; 380:eabn2253. [PMID: 37104592 PMCID: PMC10202372 DOI: 10.1126/science.abn2253] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 02/24/2023] [Indexed: 04/29/2023]
Abstract
Conserved genomic sequences disrupted in humans may underlie uniquely human phenotypic traits. We identified and characterized 10,032 human-specific conserved deletions (hCONDELs). These short (average 2.56 base pairs) deletions are enriched for human brain functions across genetic, epigenomic, and transcriptomic datasets. Using massively parallel reporter assays in six cell types, we discovered 800 hCONDELs conferring significant differences in regulatory activity, half of which enhance rather than disrupt regulatory function. We highlight several hCONDELs with putative human-specific effects on brain development, including HDAC5, CPEB4, and PPP2CA. Reverting an hCONDEL to the ancestral sequence alters the expression of LOXL2 and developmental genes involved in myelination and synaptic function. Our data provide a rich resource to investigate the evolutionary mechanisms driving new traits in humans and other species.
Collapse
Affiliation(s)
- James R. Xue
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Ava Mackay-Smith
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | | | - Michael X. Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Jared F. Akers
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Mark Noble
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Xue Li
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | | | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Elinor K. Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | - James P. Noonan
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Terence D. Capellini
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Kristen J. Brennand
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Yale University, New Haven, CT, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences Tufts University School of Medicine, Boston, MA, USA
| | - Pardis C. Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| |
Collapse
|
10
|
Rather MA, Agarwal D, Bhat TA, Khan IA, Zafar I, Kumar S, Amin A, Sundaray JK, Qadri T. Bioinformatics approaches and big data analytics opportunities in improving fisheries and aquaculture. Int J Biol Macromol 2023; 233:123549. [PMID: 36740117 DOI: 10.1016/j.ijbiomac.2023.123549] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/05/2023]
Abstract
Aquaculture has witnessed an excellent growth rate during the last two decades and offers huge potential to provide nutritional as well as livelihood security. Genomic research has contributed significantly toward the development of beneficial technologies for aquaculture. The existing high throughput technologies like next-generation technologies generate oceanic data which requires extensive analysis using appropriate tools. Bioinformatics is a rapidly evolving science that involves integrating gene based information and computational technology to produce new knowledge for the benefit of aquaculture. Bioinformatics provides new opportunities as well as challenges for information and data processing in new generation aquaculture. Rapid technical advancements have opened up a world of possibilities for using current genomics to improve aquaculture performance. Understanding the genes that govern economically relevant characteristics, necessitates a significant amount of additional research. The various dimensions of data sources includes next-generation DNA sequencing, protein sequencing, RNA sequencing gene expression profiles, metabolic pathways, molecular markers, and so on. Appropriate bioinformatics tools are developed to mine the biologically relevant and commercially useful results. The purpose of this scoping review is to present various arms of diverse bioinformatics tools with special emphasis on practical translation to the aquaculture industry.
Collapse
Affiliation(s)
- Mohd Ashraf Rather
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India.
| | - Deepak Agarwal
- Institute of Fisheries Post Graduation Studies OMR Campus, Vaniyanchavadi, Chennai, India
| | | | - Irfan Ahamd Khan
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Sujit Kumar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Adnan Amin
- Postgraduate Institute of Fisheries Education and Research Kamdhenu University, Gandhinagar-India University of Kurasthra, India; Department of Aquatic Environmental Management, Faculty of Fisheries Rangil- Ganderbel -SKUAST-K, India
| | - Jitendra Kumar Sundaray
- ICAR-Central Institute of Freshwater Aquaculture, Kausalyaganga, Bhubaneswar, Odisha 751002, India
| | - Tahiya Qadri
- Division of Food Science and Technology, SKUAST-K, Shalimar, India
| |
Collapse
|
11
|
Honma H, Takahashi N, Arisue N, Sugishita T. Analysis of genome instability and implications for the consequent phenotype in Plasmodium falciparum containing mutated MSH2-1 (P513T). Microb Genom 2023; 9. [PMID: 37083479 DOI: 10.1099/mgen.0.001003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2023] Open
Abstract
Malarial parasites exhibit extensive genomic plasticity, which induces the antigen diversification and the development of antimalarial drug resistance. Only a few studies have examined the genome maintenance mechanisms of parasites. The study aimed at elucidating the impact of a mutation in a DNA mismatch repair gene on genome stability by maintaining the mutant and wild-type parasites through serial in vitro cultures for approximately 400 days and analysing the subsequent spontaneous mutations. A P513T mutant of the DNA mismatch repair protein PfMSH2-1 from Plasmodium falciparum 3D7 was created. The mutation did not influence the base substitution rate but significantly increased the insertion/deletion (indel) mutation rate in short tandem repeats (STRs) and minisatellite loci. STR mutability was affected by allele size, genomic category and certain repeat motifs. In the mutants, significant telomere healing and homologous recombination at chromosomal ends caused extensive gene loss and generation of chimeric genes, resulting in large-scale chromosomal alteration. Additionally, the mutant showed increased tolerance to N-methyl-N'-nitro-N-nitrosoguanidine, suggesting that PfMSH2-1 was involved in recognizing DNA methylation damage. This work provides valuable insights into the role of PfMSH2-1 in genome stability and demonstrates that the genomic destabilization caused by its dysfunction may lead to antigen diversification.
Collapse
Affiliation(s)
- Hajime Honma
- Section of Global Health, Division of Public Health, Department of Hygiene and Public Health, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
- Department of International Affairs and Tropical Medicine, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
| | - Nobuyuki Takahashi
- Section of Global Health, Division of Public Health, Department of Hygiene and Public Health, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
- Department of International Affairs and Tropical Medicine, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
| | - Nobuko Arisue
- Section of Global Health, Division of Public Health, Department of Hygiene and Public Health, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
| | - Tomohiko Sugishita
- Section of Global Health, Division of Public Health, Department of Hygiene and Public Health, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
- Department of International Affairs and Tropical Medicine, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
| |
Collapse
|
12
|
Li S, Yan B, Li TKT, Lu J, Gu Y, Tan Y, Gong F, Lam TW, Xie P, Wang Y, Lin G, Luo R. Ultra-low-coverage genome-wide association study-insights into gestational age using 17,844 embryo samples with preimplantation genetic testing. Genome Med 2023; 15:10. [PMID: 36788602 PMCID: PMC9926832 DOI: 10.1186/s13073-023-01158-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 01/26/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND Very low-coverage (0.1 to 1×) whole genome sequencing (WGS) has become a promising and affordable approach to discover genomic variants of human populations for genome-wide association study (GWAS). To support genetic screening using preimplantation genetic testing (PGT) in a large population, the sequencing coverage goes below 0.1× to an ultra-low level. However, the feasibility and effectiveness of ultra-low-coverage WGS (ulcWGS) for GWAS remains undetermined. METHODS We built a pipeline to carry out analysis of ulcWGS data for GWAS. To examine its effectiveness, we benchmarked the accuracy of genotype imputation at the combination of different coverages below 0.1× and sample sizes from 2000 to 16,000, using 17,844 embryo PGT samples with approximately 0.04× average coverage and the standard Chinese sample HG005 with known genotypes. We then applied the imputed genotypes of 1744 transferred embryos who have gestational ages and complete follow-up records to GWAS. RESULTS The accuracy of genotype imputation under ultra-low coverage can be improved by increasing the sample size and applying a set of filters. From 1744 born embryos, we identified 11 genomic risk loci associated with gestational ages and 166 genes mapped to these loci according to positional, expression quantitative trait locus, and chromatin interaction strategies. Among these mapped genes, CRHBP, ICAM1, and OXTR were more frequently reported as preterm birth related. By joint analysis of gene expression data from previous studies, we constructed interrelationships of mainly CRHBP, ICAM1, PLAGL1, DNMT1, CNTLN, DKK1, and EGR2 with preterm birth, infant disease, and breast cancer. CONCLUSIONS This study not only demonstrates that ulcWGS could achieve relatively high accuracy of adequate genotype imputation and is capable of GWAS, but also provides insights into the associations between gestational age and genetic variations of the fetal embryos from Chinese population.
Collapse
Affiliation(s)
- Shumin Li
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Bin Yan
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Thomas K. T. Li
- grid.415550.00000 0004 1764 4144Department of Obstetrics & Gynecology, Queen Mary Hospital, The University of Hong Kong, Hong Kong, China
| | - Jianliang Lu
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Yifan Gu
- grid.216417.70000 0001 0379 7164NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410008 Hunan China ,grid.477823.d0000 0004 1756 593XClinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410013 Hunan China
| | - Yueqiu Tan
- grid.216417.70000 0001 0379 7164NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410008 Hunan China ,grid.477823.d0000 0004 1756 593XClinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410013 Hunan China
| | - Fei Gong
- grid.216417.70000 0001 0379 7164NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410008 Hunan China ,grid.477823.d0000 0004 1756 593XClinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410013 Hunan China
| | - Tak-Wah Lam
- grid.194645.b0000000121742757Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Pingyuan Xie
- Hunan Normal University School of Medicine, Changsha, 410013, Hunan, China. .,National Engineering and Research Center of Human Stem Cell, Changsha, Hunan, China.
| | - Yuexuan Wang
- Department of Computer Science, The University of Hong Kong, Hong Kong, China. .,College of Computer Science and Technology, Zhejiang University, Hangzhou, China.
| | - Ge Lin
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410008, Hunan, China. .,Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410013, Hunan, China. .,National Engineering and Research Center of Human Stem Cell, Changsha, Hunan, China.
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
13
|
Zheng T. TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion. Front Genet 2022; 13:981269. [DOI: 10.3389/fgene.2022.981269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/17/2022] [Indexed: 11/23/2022] Open
Abstract
Mutation detecting is a routine work for sequencing data analysis and the trading of existing tools often involves the combinations of signals on a set of overlapped sequencing reads. However, the subclonal mutations, which are reported to contribute to tumor recurrence and metastasis, are sometimes eliminated by existing signals. When the clonal proportion decreases, signals often present ambiguous, while complicated interactions among signals break the IID assumption for most of the machine learning models. Although the mutation callers could lower the thresholds, false positives are significantly introduced. The main aim here was to detect the subclonal mutations with high specificity from the scenario of ambiguous sample purities or clonal proportions. We proposed a novel machine learning approach for filtering false positive calls to accurately detect mutations with wide spectrum subclonal proportion. We have carried out a series of experiments on both simulated and real datasets, and compared to several state-of-art approaches, including freebayes, MuTect2, Sentieon and SiNVICT. The results demonstrated that the proposed method adapts well to different diluted sequencing signals and can significantly reduce the false positive when detecting subclonal mutations. The codes have been uploaded at https://github.com/TrinaZ/TL-fpFilter for academic usage only.
Collapse
|
14
|
Xiao C, Chen Z, Chen W, Padilla C, Colgan M, Wu W, Fang LT, Liu T, Yang Y, Schneider V, Wang C, Xiao W. Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples. Genome Biol 2022; 23:237. [PMID: 36352452 PMCID: PMC9648002 DOI: 10.1186/s13059-022-02803-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 10/25/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The use of a personalized haplotype-specific genome assembly, rather than an unrelated, mosaic genome like GRCh38, as a reference for detecting the full spectrum of somatic events from cancers has long been advocated but has never been explored in tumor-normal paired samples. Here, we provide the first demonstrated use of de novo assembled personalized genome as a reference for cancer mutation detection and quantifying the effects of the reference genomes on the accuracy of somatic mutation detection. RESULTS We generate de novo assemblies of the first tumor-normal paired genomes, both nuclear and mitochondrial, derived from the same individual with triple negative breast cancer. The personalized genome was chromosomal scale, haplotype phased, and annotated. We demonstrate that it provides individual specific haplotypes for complex regions and medically relevant genes. We illustrate that the personalized genome reference not only improves read alignments for both short-read and long-read sequencing data but also ameliorates the detection accuracy of somatic SNVs and SVs. We identify the equivalent somatic mutation calls between two genome references and uncover novel somatic mutations only when personalized genome assembly is used as a reference. CONCLUSIONS Our findings demonstrate that use of a personalized genome with individual-specific haplotypes is essential for accurate detection of the full spectrum of somatic mutations in the paired tumor-normal samples. The unique resource and methodology established in this study will be beneficial to the development of precision oncology medicine not only for breast cancer, but also for other cancers.
Collapse
Affiliation(s)
- Chunlin Xiao
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Zhong Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wanqiu Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Cory Padilla
- grid.504403.6Dovetail Genomics, 100 Enterprise Way, Scotts Valley, CA 95066 USA
| | - Michael Colgan
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| | - Wenjun Wu
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Li-Tai Fang
- grid.418158.10000 0004 0534 4718Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Road, Belmont, CA 94002 USA
| | - Tiantian Liu
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Yibin Yang
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Valerie Schneider
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Charles Wang
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wenming Xiao
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| |
Collapse
|
15
|
Boddé M, Makunin A, Ayala D, Bouafou L, Diabaté A, Ekpo UF, Kientega M, Le Goff G, Makanga BK, Ngangue MF, Omitola OO, Rahola N, Tripet F, Durbin R, Lawniczak MKN. High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences. eLife 2022; 11:e78775. [PMID: 36222650 PMCID: PMC9648975 DOI: 10.7554/elife.78775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 10/11/2022] [Indexed: 11/13/2022] Open
Abstract
The ANOSPP amplicon panel is a genus-wide targeted sequencing panel to facilitate large-scale monitoring of Anopheles species diversity. Combining information from the 62 nuclear amplicons present in the ANOSPP panel allows for a more senstive and specific species assignment than single gene (e.g. COI) barcoding, which is desirable in the light of permeable species boundaries. Here, we present NNoVAE, a method using Nearest Neighbours (NN) and Variational Autoencoders (VAE), which we apply to k-mers resulting from the ANOSPP amplicon sequences in order to hierarchically assign species identity. The NN step assigns a sample to a species-group by comparing the k-mers arising from each haplotype's amplicon sequence to a reference database. The VAE step is required to distinguish between closely related species, and also has sufficient resolution to reveal population structure within species. In tests on independent samples with over 80% amplicon coverage, NNoVAE correctly classifies to species level 98% of samples within the An. gambiae complex and 89% of samples outside the complex. We apply NNoVAE to over two thousand new samples from Burkina Faso and Gabon, identifying unexpected species in Gabon. NNoVAE presents an approach that may be of value to other targeted sequencing panels, and is a method that will be used to survey Anopheles species diversity and Plasmodium transmission patterns through space and time on a large scale, with plans to analyse half a million mosquitoes in the next five years.
Collapse
Affiliation(s)
- Marilou Boddé
- Department of Genetics, University of CambridgeCambridgeUnited Kingdom
- Wellcome Sanger InstituteHinxtonUnited Kingdom
| | | | - Diego Ayala
- Institut de Recherche pour le Développement, MIVEGEC, Univ. Montpellier, CNRS, IRDMontpellier,France
| | - Lemonde Bouafou
- Institut de Recherche pour le Développement, MIVEGEC, Univ. Montpellier, CNRS, IRDMontpellier,France
| | - Abdoulaye Diabaté
- Institut de Recherche en Sciences de la Santé, Direction Régionale de l'OuestBobo-DioulassoBurkina Faso
| | | | - Mahamadi Kientega
- Institut de Recherche en Sciences de la Santé, Direction Régionale de l'OuestBobo-DioulassoBurkina Faso
| | - Gilbert Le Goff
- Institut de Recherche pour le Développement, MIVEGEC, Univ. Montpellier, CNRS, IRDMontpellier,France
| | | | - Marc F Ngangue
- Centre International de Recherches Medicales de FrancevilleFrancevilleGabon
| | | | - Nil Rahola
- Institut de Recherche pour le Développement, MIVEGEC, Univ. Montpellier, CNRS, IRDMontpellier,France
| | - Frederic Tripet
- Centre for Applied Entomology and Parasitology, Keele UniversityNewcastleUnited Kingdom
| | - Richard Durbin
- Department of Genetics, University of CambridgeCambridgeUnited Kingdom
- Wellcome Sanger InstituteHinxtonUnited Kingdom
| | | |
Collapse
|
16
|
PhenGenVar: A User-Friendly Genetic Variant Detection and Visualization Tool for Precision Medicine. J Pers Med 2022; 12:jpm12060959. [PMID: 35743744 PMCID: PMC9224645 DOI: 10.3390/jpm12060959] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 06/06/2022] [Accepted: 06/09/2022] [Indexed: 12/16/2022] Open
Abstract
Precision medicine has been revolutionized by the advent of high-throughput next-generation sequencing (NGS) technology and development of various bioinformatic analysis tools for large-scale NGS big data. At the population level, biomedical studies have identified human diseases and phenotype-associated genetic variations using NGS technology, such as whole-genome sequencing, exome sequencing, and gene panel sequencing. Furthermore, patients’ genetic variations related to a specific phenotype can also be identified by analyzing their genomic information. These breakthroughs paved the way for the clinical diagnosis and precise treatment of patients’ diseases. Although many bioinformatics tools have been developed to analyze the genetic variations from the individual patient’s NGS data, it is still challenging to develop user-friendly programs for clinical physicians who do not have bioinformatics programing skills to diagnose a patient’s disease using the genomic data. In response to this demand, we developed a Phenotype to Genotype Variation program (PhenGenVar), which is a user-friendly interface for monitoring the variations in a gene of interest for molecular diagnosis. This allows for flexible filtering and browsing of variants of the disease and phenotype-associated genes. To test this program, we analyzed the whole-genome sequencing data of an anonymous person from the 1000 human genome project data. As a result, we were able to identify several genomic variations, including single-nucleotide polymorphism, insertions, and deletions in specific gene regions. Therefore, PhenGenVar can be used to diagnose a patient’s disease. PhenGenVar is freely accessible and is available at our website.
Collapse
|
17
|
Atlas G, Sreenivasan R, Sinclair A. Targeting the Non-Coding Genome for the Diagnosis of Disorders of Sex Development. Sex Dev 2021; 15:392-410. [PMID: 34634785 DOI: 10.1159/000519238] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 08/12/2021] [Indexed: 11/19/2022] Open
Abstract
Disorders of sex development (DSD) are a complex group of conditions with highly variable clinical phenotypes, most often caused by failure of gonadal development. DSD are estimated to occur in around 1.7% of all live births. Whilst the understanding of genes involved in gonad development has increased exponentially, approximately 50% of patients with a DSD remain without a genetic diagnosis, possibly implicating non-coding genomic regions instead. Here, we review how variants in the non-coding genome of DSD patients can be identified using techniques such as array comparative genomic hybridization (CGH) to detect copy number variants (CNVs), and more recently, whole genome sequencing (WGS). Once a CNV in a patient's non-coding genome is identified, putative regulatory elements such as enhancers need to be determined within these vast genomic regions. We will review the available online tools and databases that can be used to refine regions with potential enhancer activity based on chromosomal accessibility, histone modifications, transcription factor binding site analysis, chromatin conformation, and disease association. We will also review the current in vitro and in vivo techniques available to demonstrate the functionality of the identified enhancers. The review concludes with a clinical update on the enhancers linked to DSD.
Collapse
Affiliation(s)
- Gabby Atlas
- Reproductive Development, Murdoch Children's Research Institute, Melbourne, Victoria, Australia, .,Department of Endocrinology and Diabetes, Royal Children's Hospital, Melbourne, Victoria, Australia, .,Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia,
| | - Rajini Sreenivasan
- Reproductive Development, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.,Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
| | - Andrew Sinclair
- Reproductive Development, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.,Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
18
|
Yuan X, Xu X, Zhao H, Duan J. ERINS: Novel Sequence Insertion Detection by Constructing an Extended Reference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1893-1901. [PMID: 31751246 DOI: 10.1109/tcbb.2019.2954315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Next generation sequencing technology has led to the development of methods for the detection of novel sequence insertions (nsINS). Multiple signatures from short reads are usually extracted to improve nsINS detection performance. However, characterization of nsINSs larger than the mean insert size is still challenging. This article presents a new method, ERINS, to detect nsINS contents and genotypes of full spectrum range size. It integrates the features of structural variations and mapping states of split reads to find nsINS breakpoints, and then adopts a left-most mapping strategy to infer nsINS content by iteratively extending the standard reference at each breakpoint. Finally, it realigns all reads to the extended reference and infers nsINS genotypes through statistical testing on read counts. We test and validate the performance of ERINS on simulation and real sequencing datasets. The simulation experimental results demonstrate that it outperforms several peer methods with respect to sensitivity and precision. The real data application indicates that ERINS obtains high consistent results with those of previously reported and detects nsINSs over 200 base pairs that many other methods fail. In conclusion, ERINS can be used as a supplement to existing tools and will become a routine approach for characterizing nsINSs.
Collapse
|
19
|
Genomic diversity of 39 samples of Pyropia species grown in Japan. PLoS One 2021; 16:e0252207. [PMID: 34106965 PMCID: PMC8189503 DOI: 10.1371/journal.pone.0252207] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 05/11/2021] [Indexed: 11/19/2022] Open
Abstract
Some Pyropia species, such as nori (P. yezoensis), are important marine crops. We conducted a phylogenetic analysis of 39 samples of Pyropia species grown in Japan using organellar genome sequences. A comparison of the chloroplast DNA sequences with those from China showed a clear genetic separation between Japanese and Chinese P. yezoensis. Conversely, comparing the mitochondrial DNA sequences did not separate Japanese and Chinese P. yezoensis. Analysis of organellar genomes showed that the genetic diversity of Japanese P. yezoensis used in this study is lower than that of Chinese wild P. yezoensis. To analyze the genetic relationships between samples of Japanese Pyropia, we used whole-genome resequencing to analyze their nuclear genomes. In the offspring resulting from cross-breeding between P. yezoensis and P. tenera, nearly 90% of the genotypes analyzed by mapping were explained by the presence of different chromosomes originating from two different parental species. Although the genetic diversity of Japanese P. yezoensis is low, analysis of nuclear genomes genetically separated each sample. Samples isolated from the sea were often genetically similar to those being farmed. Study of genetic heterogeneity of samples within a single aquaculture strain of P. yezoensis showed that samples were divided into two groups and the samples with frequent abnormal budding formed a single, genetically similar group. The results of this study will be useful for breeding and the conservation of Pyropia species.
Collapse
|
20
|
Garg S, Aach J, Li H, Sebenius I, Durbin R, Church G. A haplotype-aware de novo assembly of related individuals using pedigree sequence graph. Bioinformatics 2020; 36:2385-2392. [PMID: 31860070 DOI: 10.1093/bioinformatics/btz942] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 11/23/2019] [Accepted: 12/18/2019] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Reconstructing high-quality haplotype-resolved assemblies for related individuals has important applications in Mendelian diseases and population genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP) and the Genome in a Bottle project (GIAB), a variety of sequencing datasets from trios of diploid genomes are becoming available. Current trio assembly approaches are not designed to incorporate long- and short-read data from mother-father-child trios, and therefore require relatively high coverages of costly long-read data to produce high-quality assemblies. Thus, building a trio-aware assembler capable of producing accurate and chromosomal-scale diploid genomes of all individuals in a pedigree, while being cost-effective in terms of sequencing costs, is a pressing need of the genomics community. RESULTS We present a novel pedigree sequence graph based approach to diploid assembly using accurate Illumina data and long-read Pacific Biosciences (PacBio) data from all related individuals, thereby generalizing our previous work on single individuals. We demonstrate the effectiveness of our pedigree approach on a simulated trio of pseudo-diploid yeast genomes with different heterozygosity rates, and real data from human chromosome. We show that we require as little as 30× coverage Illumina data and 15× PacBio data from each individual in a trio to generate chromosomal-scale phased assemblies. Additionally, we show that we can detect and phase variants from generated phased assemblies. AVAILABILITY AND IMPLEMENTATION https://github.com/shilpagarg/WHdenovo.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Genetics, Harvard Medical School.,Wyss Institute for Biologically Inspired Engineering, Harvard University
| | - John Aach
- Department of Genetics, Harvard Medical School
| | - Heng Li
- Department of Biomedical Informatics, Harvard Medical School, Boston
| | - Isaac Sebenius
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - George Church
- Department of Genetics, Harvard Medical School.,Wyss Institute for Biologically Inspired Engineering, Harvard University
| |
Collapse
|
21
|
Dong J, Qi M, Wang S, Yuan X. DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads. Front Genet 2020; 11:924. [PMID: 32849857 PMCID: PMC7433346 DOI: 10.3389/fgene.2020.00924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/24/2020] [Indexed: 11/21/2022] Open
Abstract
Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease. The advent of next-generation sequencing technologies has made it possible for the study of TDs. However, detection is still challenging due to the uneven distribution of reads and the uncertain amplitude of TD regions. In this paper, we present a new method, DINTD (Detection and INference of Tandem Duplications), to detect and infer TDs using short sequencing reads. The major principle of the proposed method is that it first extracts read depth and mapping quality signals, then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to find the possible TD regions. The total variation penalized least squares model is fitted with read depth and mapping quality signals to denoise signals. A 2D binary search tree is used to search the neighbor points effectively. To further identify the exact breakpoints of the TD regions, split-read signals are integrated into DINTD. The experimental results of DINTD on simulated data sets showed that DINTD can outperform other methods for sensitivity, precision, F1-score, and boundary bias. DINTD is further validated on real samples, and the experiment results indicate that it is consistent with other methods. This study indicates that DINTD can be used as an effective tool for detecting TDs.
Collapse
Affiliation(s)
- Jinxin Dong
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Minyong Qi
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Shaoqiang Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
22
|
Almlöf JC, Nystedt S, Mechtidou A, Leonard D, Eloranta ML, Grosso G, Sjöwall C, Bengtsson AA, Jönsen A, Gunnarsson I, Svenungsson E, Rönnblom L, Sandling JK, Syvänen AC. Contributions of de novo variants to systemic lupus erythematosus. Eur J Hum Genet 2020; 29:184-193. [PMID: 32724065 PMCID: PMC7852530 DOI: 10.1038/s41431-020-0698-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 06/04/2020] [Accepted: 07/14/2020] [Indexed: 12/21/2022] Open
Abstract
By performing whole-genome sequencing in a Swedish cohort of 71 parent-offspring trios, in which the child in each family is affected by systemic lupus erythematosus (SLE, OMIM 152700), we investigated the contribution of de novo variants to risk of SLE. We found de novo single nucleotide variants (SNVs) to be significantly enriched in gene promoters in SLE patients compared with healthy controls at a level corresponding to 26 de novo promoter SNVs more in each patient than expected. We identified 12 de novo SNVs in promoter regions of genes that have been previously implicated in SLE, or that have functions that could be of relevance to SLE. Furthermore, we detected three missense de novo SNVs, five de novo insertion-deletions, and three de novo structural variants with potential to affect the expression of genes that are relevant for SLE. Based on enrichment analysis, disease-affecting de novo SNVs are expected to occur in one-third of SLE patients. This study shows that de novo variants in promoters commonly contribute to the genetic risk of SLE. The fact that de novo SNVs in SLE were enriched to promoter regions highlights the importance of using whole-genome sequencing for identification of de novo variants.
Collapse
Affiliation(s)
- Jonas Carlsson Almlöf
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden.
| | - Sara Nystedt
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| | - Aikaterini Mechtidou
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| | - Dag Leonard
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Maija-Leena Eloranta
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Giorgia Grosso
- Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Christopher Sjöwall
- Department of Clinical and Experimental Medicine, Rheumatology/Division of Neuro and Inflammation Sciences, Linköping University, 581 83, Linköping, Sweden
| | - Anders A Bengtsson
- Department of Clinical Sciences, Rheumatology, Lund University, Skåne University Hospital, 222 42, Lund, Sweden
| | - Andreas Jönsen
- Department of Clinical Sciences, Rheumatology, Lund University, Skåne University Hospital, 222 42, Lund, Sweden
| | - Iva Gunnarsson
- Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Elisabet Svenungsson
- Department of Medicine, Karolinska Institutet, Rheumatology, Karolinska University Hospital, 171 77, Stockholm, Sweden
| | - Lars Rönnblom
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Johanna K Sandling
- Department of Medical Sciences, Rheumatology and Science for Life Laboratory, Uppsala University, 751 85, Uppsala, Sweden
| | - Ann-Christine Syvänen
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, 751 23, Uppsala, Sweden
| |
Collapse
|
23
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
24
|
Eisfeldt J, Mårtensson G, Ameur A, Nilsson D, Lindstrand A. Discovery of Novel Sequences in 1,000 Swedish Genomes. Mol Biol Evol 2020; 37:18-30. [PMID: 31560401 PMCID: PMC6984370 DOI: 10.1093/molbev/msz176] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin.
Collapse
Affiliation(s)
- Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Gustaf Mårtensson
- Division of Nanobiotechnology, Department of Protein Science, Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Daniel Nilsson
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
25
|
Loiseau V, Herniou EA, Moreau Y, Lévêque N, Meignin C, Daeffler L, Federici B, Cordaux R, Gilbert C. Wide spectrum and high frequency of genomic structural variation, including transposable elements, in large double-stranded DNA viruses. Virus Evol 2020; 6:vez060. [PMID: 32002191 PMCID: PMC6983493 DOI: 10.1093/ve/vez060] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Our knowledge of the diversity and frequency of genomic structural variation segregating in populations of large double-stranded (ds) DNA viruses is limited. Here, we sequenced the genome of a baculovirus (Autographa californica multiple nucleopolyhedrovirus [AcMNPV]) purified from beet armyworm (Spodoptera exigua) larvae at depths >195,000× using both short- (Illumina) and long-read (PacBio) technologies. Using a pipeline relying on hierarchical clustering of structural variants (SVs) detected in individual short- and long-reads by six variant callers, we identified a total of 1,141 SVs in AcMNPV, including 464 deletions, 443 inversions, 160 duplications, and 74 insertions. These variants are considered robust and unlikely to result from technical artifacts because they were independently detected in at least three long reads as well as at least three short reads. SVs are distributed along the entire AcMNPV genome and may involve large genomic regions (30,496 bp on average). We show that no less than 39.9 per cent of genomes carry at least one SV in AcMNPV populations, that the vast majority of SVs (75%) segregate at very low frequency (<0.01%) and that very few SVs persist after ten replication cycles, consistent with a negative impact of most SVs on AcMNPV fitness. Using short-read sequencing datasets, we then show that populations of two iridoviruses and one herpesvirus are also full of SVs, as they contain between 426 and 1,102 SVs carried by 52.4–80.1 per cent of genomes. Finally, AcMNPV long reads allowed us to identify 1,757 transposable elements (TEs) insertions, 895 of which are truncated and occur at one extremity of the reads. This further supports the role of baculoviruses as possible vectors of horizontal transfer of TEs. Altogether, we found that SVs, which evolve mostly under rapid dynamics of gain and loss in viral populations, represent an important feature in the biology of large dsDNA viruses.
Collapse
Affiliation(s)
- Vincent Loiseau
- Laboratoire Evolution, Génomes, Comportement, Écologie, Unité Mixte de Recherche 9191 Centre National de la Recherche Scientifique et Unité Mixte de Recherche 247 Institut de Recherche pour le Développement, Université Paris-Saclay, Gif-sur-Yvette 91198, France
| | - Elisabeth A Herniou
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS - Université de Tours, 37200 Tours, France
| | - Yannis Moreau
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS - Université de Tours, 37200 Tours, France
| | - Nicolas Lévêque
- Laboratoire de Virologie et Mycobactériologie, CHU de Poitiers, 86000 Poitiers, France.,Laboratoire Inflammation, Tissus Epithéliaux et Cytokines, EA 4331, Université de Poitiers, 86000 Poitiers, France
| | - Carine Meignin
- Modèles Insectes d'Immunité Innée (M3i), Université de Strasbourg, IBMC CNRS-UPR9022, Strasbourg F-67000, France
| | - Laurent Daeffler
- Modèles Insectes d'Immunité Innée (M3i), Université de Strasbourg, IBMC CNRS-UPR9022, Strasbourg F-67000, France
| | - Brian Federici
- Department of Entomology and Institute for Integrative Genome Biology, University of California, Riverside, CA 92521, USA
| | - Richard Cordaux
- Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Unité Mixte de Recherche 7267 Centre National de la Recherche Scientifique, Université de Poitiers, 86000 Poitiers, France
| | - Clément Gilbert
- Laboratoire Evolution, Génomes, Comportement, Écologie, Unité Mixte de Recherche 9191 Centre National de la Recherche Scientifique et Unité Mixte de Recherche 247 Institut de Recherche pour le Développement, Université Paris-Saclay, Gif-sur-Yvette 91198, France
| |
Collapse
|
26
|
Franco I, Helgadottir HT, Moggio A, Larsson M, Vrtačnik P, Johansson A, Norgren N, Lundin P, Mas-Ponte D, Nordström J, Lundgren T, Stenvinkel P, Wennberg L, Supek F, Eriksson M. Whole genome DNA sequencing provides an atlas of somatic mutagenesis in healthy human cells and identifies a tumor-prone cell type. Genome Biol 2019; 20:285. [PMID: 31849330 PMCID: PMC6918713 DOI: 10.1186/s13059-019-1892-z] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 11/18/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The lifelong accumulation of somatic mutations underlies age-related phenotypes and cancer. Mutagenic forces are thought to shape the genome of aging cells in a tissue-specific way. Whole genome analyses of somatic mutation patterns, based on both types and genomic distribution of variants, can shed light on specific processes active in different human tissues and their effect on the transition to cancer. RESULTS To analyze somatic mutation patterns, we compile a comprehensive genetic atlas of somatic mutations in healthy human cells. High-confidence variants are obtained from newly generated and publicly available whole genome DNA sequencing data from single non-cancer cells, clonally expanded in vitro. To enable a well-controlled comparison of different cell types, we obtain single genome data (92% mean coverage) from multi-organ biopsies from the same donors. These data show multiple cell types that are protected from mutagens and display a stereotyped mutation profile, despite their origin from different tissues. Conversely, the same tissue harbors cells with distinct mutation profiles associated to different differentiation states. Analyses of mutation rate in the coding and non-coding portions of the genome identify a cell type bearing a unique mutation pattern characterized by mutation enrichment in active chromatin, regulatory, and transcribed regions. CONCLUSIONS Our analysis of normal cells from healthy donors identifies a somatic mutation landscape that enhances the risk of tumor transformation in a specific cell population from the kidney proximal tubule. This unique pattern is characterized by high rate of mutation accumulation during adult life and specific targeting of expressed genes and regulatory regions.
Collapse
Affiliation(s)
- Irene Franco
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, Huddinge, Sweden.
| | - Hafdis T Helgadottir
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, Huddinge, Sweden
| | - Aldo Moggio
- Department of Medicine Huddinge, Integrated Cardio Metabolic Center, Karolinska Institutet, Huddinge, Sweden
| | - Malin Larsson
- Science for Life Laboratory, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| | - Peter Vrtačnik
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, Huddinge, Sweden
| | - Anna Johansson
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Nina Norgren
- Science for Life Laboratory, Department of Molecular Biology, Umeå University, Umeå, Sweden
| | - Pär Lundin
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, Huddinge, Sweden
- Science for Life Laboratory, Department of Biochemistry and Biophysics (DBB), Stockholm University, Stockholm, Sweden
| | - David Mas-Ponte
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028, Barcelona, Spain
| | - Johan Nordström
- Department of Clinical Sciences, Intervention and Technology, Karolinska Institutet, Division of Transplantation Surgery, Karolinska University Hospital, Huddinge, Sweden
| | - Torbjörn Lundgren
- Department of Clinical Sciences, Intervention and Technology, Karolinska Institutet, Division of Transplantation Surgery, Karolinska University Hospital, Huddinge, Sweden
| | - Peter Stenvinkel
- Department of Clinical Sciences, Intervention and Technology, Karolinska Institutet, Division of Renal Medicine, Karolinska University Hospital, Huddinge, Sweden
| | - Lars Wennberg
- Department of Clinical Sciences, Intervention and Technology, Karolinska Institutet, Division of Transplantation Surgery, Karolinska University Hospital, Huddinge, Sweden
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Maria Eriksson
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, Huddinge, Sweden.
| |
Collapse
|
27
|
Fang L, Kao C, Gonzalez MV, Mafra FA, Pellegrino da Silva R, Li M, Wenzel SS, Wimmer K, Hakonarson H, Wang K. LinkedSV for detection of mosaic structural variants from linked-read exome and genome sequencing data. Nat Commun 2019; 10:5585. [PMID: 31811119 PMCID: PMC6898185 DOI: 10.1038/s41467-019-13397-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 11/07/2019] [Indexed: 02/01/2023] Open
Abstract
Linked-read sequencing provides long-range information on short-read sequencing data by barcoding reads originating from the same DNA molecule, and can improve detection and breakpoint identification for structural variants (SVs). Here we present LinkedSV for SV detection on linked-read sequencing data. LinkedSV considers barcode overlapping and enriched fragment endpoints as signals to detect large SVs, while it leverages read depth, paired-end signals and local assembly to detect small SVs. Benchmarking studies demonstrate that LinkedSV outperforms existing tools, especially on exome data and on somatic SVs with low variant allele frequencies. We demonstrate clinical cases where LinkedSV identifies disease-causal SVs from linked-read exome sequencing data missed by conventional exome sequencing, and show examples where LinkedSV identifies SVs missed by high-coverage long-read sequencing. In summary, LinkedSV can detect SVs missed by conventional short-read and long-read sequencing approaches, and may resolve negative cases from clinical genome/exome sequencing studies.
Collapse
Affiliation(s)
- Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Charlly Kao
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Michael V Gonzalez
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Fernanda A Mafra
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | | | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Sören-Sebastian Wenzel
- Institute of Human Genetics, Department for Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Katharina Wimmer
- Institute of Human Genetics, Department for Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Hakon Hakonarson
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
28
|
Lindstrand A, Eisfeldt J, Pettersson M, Carvalho CMB, Kvarnung M, Grigelioniene G, Anderlid BM, Bjerin O, Gustavsson P, Hammarsjö A, Georgii-Hemming P, Iwarsson E, Johansson-Soller M, Lagerstedt-Robinson K, Lieden A, Magnusson M, Martin M, Malmgren H, Nordenskjöld M, Norling A, Sahlin E, Stranneheim H, Tham E, Wincent J, Ygberg S, Wedell A, Wirta V, Nordgren A, Lundin J, Nilsson D. From cytogenetics to cytogenomics: whole-genome sequencing as a first-line test comprehensively captures the diverse spectrum of disease-causing genetic variation underlying intellectual disability. Genome Med 2019; 11:68. [PMID: 31694722 PMCID: PMC6836550 DOI: 10.1186/s13073-019-0675-1] [Citation(s) in RCA: 100] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 10/09/2019] [Indexed: 12/30/2022] Open
Abstract
Background Since different types of genetic variants, from single nucleotide variants (SNVs) to large chromosomal rearrangements, underlie intellectual disability, we evaluated the use of whole-genome sequencing (WGS) rather than chromosomal microarray analysis (CMA) as a first-line genetic diagnostic test. Methods We analyzed three cohorts with short-read WGS: (i) a retrospective cohort with validated copy number variants (CNVs) (cohort 1, n = 68), (ii) individuals referred for monogenic multi-gene panels (cohort 2, n = 156), and (iii) 100 prospective, consecutive cases referred to our center for CMA (cohort 3). Bioinformatic tools developed include FindSV, SVDB, Rhocall, Rhoviz, and vcf2cytosure. Results First, we validated our structural variant (SV)-calling pipeline on cohort 1, consisting of three trisomies and 79 deletions and duplications with a median size of 850 kb (min 500 bp, max 155 Mb). All variants were detected. Second, we utilized the same pipeline in cohort 2 and analyzed with monogenic WGS panels, increasing the diagnostic yield to 8%. Next, cohort 3 was analyzed by both CMA and WGS. The WGS data was processed for large (> 10 kb) SVs genome-wide and for exonic SVs and SNVs in a panel of 887 genes linked to intellectual disability as well as genes matched to patient-specific Human Phenotype Ontology (HPO) phenotypes. This yielded a total of 25 pathogenic variants (SNVs or SVs), of which 12 were detected by CMA as well. We also applied short tandem repeat (STR) expansion detection and discovered one pathologic expansion in ATXN7. Finally, a case of Prader-Willi syndrome with uniparental disomy (UPD) was validated in the WGS data. Important positional information was obtained in all cohorts. Remarkably, 7% of the analyzed cases harbored complex structural variants, as exemplified by a ring chromosome and two duplications found to be an insertional translocation and part of a cryptic unbalanced translocation, respectively. Conclusion The overall diagnostic rate of 27% was more than doubled compared to clinical microarray (12%). Using WGS, we detected a wide range of SVs with high accuracy. Since the WGS data also allowed for analysis of SNVs, UPD, and STRs, it represents a powerful comprehensive genetic test in a clinical diagnostic laboratory setting.
Collapse
Affiliation(s)
- Anna Lindstrand
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden. .,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden. .,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.
| | - Jesper Eisfeldt
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| | - Maria Pettersson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Malin Kvarnung
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Giedre Grigelioniene
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Britt-Marie Anderlid
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Olof Bjerin
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Peter Gustavsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Anna Hammarsjö
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | | | - Erik Iwarsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Maria Johansson-Soller
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Kristina Lagerstedt-Robinson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Agne Lieden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Måns Magnusson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Marcel Martin
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Helena Malmgren
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Magnus Nordenskjöld
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Ameli Norling
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Ellika Sahlin
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Henrik Stranneheim
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Emma Tham
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Josephine Wincent
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Sofia Ygberg
- The Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Anna Wedell
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Centre for Inherited Metabolic Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Valtteri Wirta
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden.,Science for Life Laboratory, Department of Microbiology, Tumor and Cell biology, Karolinska Institutet, Stockholm, Sweden
| | - Ann Nordgren
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Johanna Lundin
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Daniel Nilsson
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
29
|
Hernández-Lemus E, Reyes-Gopar H, Espinal-Enríquez J, Ochoa S. The Many Faces of Gene Regulation in Cancer: A Computational Oncogenomics Outlook. Genes (Basel) 2019; 10:E865. [PMID: 31671657 PMCID: PMC6896122 DOI: 10.3390/genes10110865] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/24/2019] [Indexed: 12/16/2022] Open
Abstract
Cancer is a complex disease at many different levels. The molecular phenomenology of cancer is also quite rich. The mutational and genomic origins of cancer and their downstream effects on processes such as the reprogramming of the gene regulatory control and the molecular pathways depending on such control have been recognized as central to the characterization of the disease. More important though is the understanding of their causes, prognosis, and therapeutics. There is a multitude of factors associated with anomalous control of gene expression in cancer. Many of these factors are now amenable to be studied comprehensively by means of experiments based on diverse omic technologies. However, characterizing each dimension of the phenomenon individually has proven to fall short in presenting a clear picture of expression regulation as a whole. In this review article, we discuss some of the more relevant factors affecting gene expression control both, under normal conditions and in tumor settings. We describe the different omic approaches that we can use as well as the computational genomic analysis needed to track down these factors. Then we present theoretical and computational frameworks developed to integrate the amount of diverse information provided by such single-omic analyses. We contextualize this within a systems biology-based multi-omic regulation setting, aimed at better understanding the complex interplay of gene expression deregulation in cancer.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Helena Reyes-Gopar
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| |
Collapse
|
30
|
A high-quality cucumber genome assembly enhances computational comparative genomics. Mol Genet Genomics 2019; 295:177-193. [PMID: 31620884 DOI: 10.1007/s00438-019-01614-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 09/30/2019] [Indexed: 01/12/2023]
Abstract
Genetic variation is expressed by the presence of polymorphisms in compared genomes of individuals that can be transferred to next generations. The aim of this work was to reveal genome dynamics by predicting polymorphisms among the genomes of three individuals of the highly inbred B10 cucumber (Cucumis sativus L.) line. In this study, bioinformatic comparative genomics was used to uncover cucumber genome dynamics (also called real-time evolution). We obtained a new genome draft assembly from long single molecule real-time (SMRT) sequencing reads and used short paired-end read data from three individuals to analyse the polymorphisms. Using this approach, we uncovered differentiation aspects in the genomes of the inbred B10 line. The newly assembled genome sequence (B10v3) has the highest contiguity and quality characteristics among the currently available cucumber genome draft sequences. Standard and newly designed approaches were used to predict single nucleotide and structural variants that were unique among the three individual genomes. Some of the variant predictions spanned protein-coding genes and their promoters, and some were in the neighbourhood of annotated interspersed repetitive elements, indicating that the highly inbred homozygous plants remained genetically dynamic. This is the first bioinformatic comparative genomics study of a single highly inbred plant line. For this project, we developed a polymorphism prediction method with optimized precision parameters, which allowed the effective detection of small nucleotide variants (SNVs). This methodology could significantly improve bioinformatic pipelines for comparative genomics and thus has great practical potential in genomic metadata handling.
Collapse
|
31
|
Walker MA, Pedamallu CS, Ojesina AI, Bullman S, Sharpe T, Whelan CW, Meyerson M. GATK PathSeq: a customizable computational tool for the discovery and identification of microbial sequences in libraries from eukaryotic hosts. Bioinformatics 2019; 34:4287-4289. [PMID: 29982281 PMCID: PMC6289130 DOI: 10.1093/bioinformatics/bty501] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 07/02/2018] [Indexed: 12/18/2022] Open
Abstract
Summary We present an updated version of our computational pipeline, PathSeq, for the discovery and identification of microbial sequences in genomic and transcriptomic libraries from eukaryotic hosts. This pipeline is available in the Genome Analysis Toolkit (GATK) as a suite of configurable tools that can report the microbial composition of DNA or RNA short-read sequencing samples and identify unknown sequences for downstream assembly of novel organisms. GATK PathSeq enables sample analysis in minutes at low cost. In addition, these tools are built with the GATK engine and Apache Spark framework, providing robust, rapid parallelization of read quality filtering, host subtraction and microbial alignment in workstation, cluster and cloud environments. Availability and implementation These tools are available as a part of the GATK at https://github.com/broadinstitute/gatk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mark A Walker
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Chandra Sekhar Pedamallu
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA.,Department of Medical Oncology and Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Akinyemi I Ojesina
- University of Alabama at Birmingham (UAB), Birmingham, AL, USA.,HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Susan Bullman
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA.,Department of Medical Oncology and Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Ted Sharpe
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Christopher W Whelan
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Matthew Meyerson
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA.,Department of Medical Oncology and Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
32
|
Wu X, Heffelfinger C, Zhao H, Dellaporta SL. Benchmarking variant identification tools for plant diversity discovery. BMC Genomics 2019; 20:701. [PMID: 31500583 PMCID: PMC6734213 DOI: 10.1186/s12864-019-6057-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 08/22/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ability to accurately and comprehensively identify genomic variations is critical for plant studies utilizing high-throughput sequencing. Most bioinformatics tools for processing next-generation sequencing data were originally developed and tested in human studies, raising questions as to their efficacy for plant research. A detailed evaluation of the entire variant calling pipeline, including alignment, variant calling, variant filtering, and imputation was performed on different programs using both simulated and real plant genomic datasets. RESULTS A comparison of SOAP2, Bowtie2, and BWA-MEM found that BWA-MEM was consistently able to align the most reads with high accuracy, whereas Bowtie2 had the highest overall accuracy. Comparative results of GATK HaplotypCaller versus SAMtools mpileup indicated that the choice of variant caller affected precision and recall differentially depending on the levels of diversity, sequence coverage and genome complexity. A cross-reference experiment of S. lycopersicum and S. pennellii reference genomes revealed the inadequacy of single reference genome for variant discovery that includes distantly-related plant individuals. Machine-learning-based variant filtering strategy outperformed the traditional hard-cutoff strategy resulting in higher number of true positive variants and fewer false positive variants. A 2-step imputation method, which utilized a set of high-confidence SNPs as the reference panel, showed up to 60% higher accuracy than direct LD-based imputation. CONCLUSIONS Programs in the variant discovery pipeline have different performance on plant genomic dataset. Choice of the programs is subjected to the goal of the study and available resources. This study serves as an important guiding information for plant biologists utilizing next-generation sequencing data for diversity characterization and crop improvement.
Collapse
Affiliation(s)
- Xing Wu
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, 06520-8104, USA
| | - Christopher Heffelfinger
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, 06520-8104, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, 06520-8034, USA
| | - Stephen L Dellaporta
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, 06520-8104, USA.
| |
Collapse
|
33
|
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 2019; 20:117. [PMID: 31159850 PMCID: PMC6547561 DOI: 10.1186/s13059-019-1720-5] [Citation(s) in RCA: 284] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 05/20/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. RESULTS We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. CONCLUSION These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Xiaoxi Liu
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Chikashi Terao
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| |
Collapse
|
34
|
Tian S, Yan H, Klee EW, Kalmbach M, Slager SL. Comparative analysis of de novo assemblers for variation discovery in personal genomes. Brief Bioinform 2019; 19:893-904. [PMID: 28407084 PMCID: PMC6169673 DOI: 10.1093/bib/bbx037] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 03/08/2017] [Indexed: 12/30/2022] Open
Abstract
Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome.
Collapse
Affiliation(s)
- Shulan Tian
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Huihuang Yan
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Eric W Klee
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.,Center for Individualized Medicine Bioinformatics Program, Mayo Clinic, USA
| | - Michael Kalmbach
- Division of Information Management and Analytics, Department of Information Technology, Mayo Clinic, USA
| | - Susan L Slager
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
35
|
Monat C, Schreiber M, Stein N, Mascher M. Prospects of pan-genomics in barley. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:785-796. [PMID: 30446793 DOI: 10.1007/s00122-018-3234-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 11/07/2018] [Indexed: 05/10/2023]
Abstract
The concept of a pan-genome refers to intraspecific diversity in genome content and structure, encompassing both genes and intergenic space. Pan-genomic studies employ a combination of de novo sequence assembly and reference-based alignment to discover and genotype structural variants. The large size and complex structure of Triticeae genomes were for a long time an obstacle for genomic research in barley and its relatives. Now that a reference genome is available, computational pipelines for high-quality sequence assembly are in place, and sequence costs continue to drop, investigations into the structural diversity of the barley genome seem within reach. Here, we review the recent progress on pan-genomics in the model grass Brachypodium distachyon, and the cereal crops rice and maize, and devise a multi-tiered strategy for a pan-genome project in barley. Our design involves: (1) the construction of high-quality de novo sequence assemblies for a small core set of representative genotypes, (2) short-read sequencing of a large diversity panel of genebank accessions to medium coverage and (3) the use of complementary methods such as chromosome-conformation capture sequencing and k-mer-based association genetics. The in silico representation of the barley pan-genome may inform about the mechanisms of structural genome evolution in the Triticeae and supplement quantitative genetics models of crop performance for better accuracy and predictive ability.
Collapse
Affiliation(s)
- Cécile Monat
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466, Seeland, Germany
| | - Mona Schreiber
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466, Seeland, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466, Seeland, Germany
- Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, 37075, Göttingen, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany.
| |
Collapse
|
36
|
Li H, Bloom JM, Farjoun Y, Fleharty M, Gauthier L, Neale B, MacArthur D. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat Methods 2018; 15:595-597. [PMID: 30013044 PMCID: PMC6341484 DOI: 10.1038/s41592-018-0054-7] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 05/14/2018] [Indexed: 12/30/2022]
Abstract
Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.
Collapse
Affiliation(s)
- Heng Li
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| | | | - Yossi Farjoun
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Mark Fleharty
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Benjamin Neale
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| | - Daniel MacArthur
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
37
|
Sohn JI, Nam K, Hong H, Kim JM, Lim D, Lee KT, Do YJ, Cho CY, Kim N, Chai HH, Nam JW. Whole genome and transcriptome maps of the entirely black native Korean chicken breed Yeonsan Ogye. Gigascience 2018; 7:5052204. [PMID: 30010758 PMCID: PMC6065499 DOI: 10.1093/gigascience/giy086] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 05/19/2018] [Accepted: 07/04/2018] [Indexed: 12/30/2022] Open
Abstract
Background Yeonsan Ogye (YO), an indigenous Korean chicken breed (Gallus gallus domesticus), has entirely black external features and internal organs. In this study, the draft genome of YO was assembled using a hybrid de novo assembly method that takes advantage of high-depth Illumina short reads (376.6X) and low-depth Pacific Biosciences (PacBio) long reads (9.7X). Findings The contig and scaffold NG50s of the hybrid de novo assembly were 362.3 Kbp and 16.8 Mbp, respectively. The completeness (97.6%) of the draft genome (Ogye_1.1) was evaluated with single-copy orthologous genes using Benchmarking Universal Single-Copy Orthologs and found to be comparable to the current chicken reference genome (galGal5; 97.4%; contigs were assembled with high-depth PacBio long reads (50X) and scaffolded with short reads) and superior to other avian genomes (92%-93%; assembled with short read-only or hybrid methods). Compared to galGal4 and galGal5, the draft genome included 551 structural variations including the fibromelanosis (FM) locus duplication, related to hyperpigmentation. To comprehensively reconstruct transcriptome maps, RNA sequencing and reduced representation bisulfite sequencing data were analyzed from 20 tissues, including 4 black tissues (skin, shank, comb, and fascia). The maps included 15,766 protein-coding and 6,900 long noncoding RNA genes, many of which were tissue-specifically expressed and displayed tissue-specific DNA methylation patterns in the promoter regions. Conclusions We expect that the resulting genome sequence and transcriptome maps will be valuable resources for studying domestic chicken breeds, including black-skinned chickens, as well as for understanding genomic differences between breeds and the evolution of hyperpigmented chickens and functional elements related to hyperpigmentation.
Collapse
Affiliation(s)
- Jang-il Sohn
- Department of Life Science, Hanyang University, Seoul, 133-791, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Seoul, 133-791, Republic of Korea
| | - Kyoungwoo Nam
- Department of Life Science, Hanyang University, Seoul, 133-791, Republic of Korea
| | - Hyosun Hong
- Department of Life Science, Hanyang University, Seoul, 133-791, Republic of Korea
| | - Jun-Mo Kim
- Department of Animal Science and Technology, Chung-Ang University, Anseong, Gyeonggi-do, 17546, Republic of Korea
| | - Dajeong Lim
- Department of Animal Biotechnology & Environment, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - Kyung-Tai Lee
- Department of Animal Biotechnology & Environment, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - Yoon Jung Do
- Department of Animal Biotechnology & Environment, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - Chang Yeon Cho
- Animal Genetic Resource Research Center, National Institute of Animal Science, RDA, Namwon, 55717, Republic of Korea
| | - Namshin Kim
- Personalized Genomic Medicine Research Center, KRIBB, Daejeon, 34141, Republic of Korea
| | - Han-Ha Chai
- Department of Animal Biotechnology & Environment, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
- College of Pharmacy, Chonnam National University, Kwangju, 61186, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, Hanyang University, Seoul, 133-791, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Seoul, 133-791, Republic of Korea
| |
Collapse
|
38
|
Garg S, Rautiainen M, Novak AM, Garrison E, Durbin R, Marschall T. A graph-based approach to diploid genome assembly. Bioinformatics 2018; 34:i105-i114. [PMID: 29949989 PMCID: PMC6022571 DOI: 10.1093/bioinformatics/bty279] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community. Results We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants. Availability and implementation https://github.com/whatshap/whatshap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shilpa Garg
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | - Adam M Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
| |
Collapse
|
39
|
Abstract
16GT is a variant caller for Illumina whole-genome and whole-exome sequencing data. It uses a new 16-genotype probabilistic model to unify single nucleotide polymorphism and insertion and deletion calling in a single variant calling algorithm. In benchmark comparisons with 5 other widely used variant callers on a modern 36-core server, 16GT demonstrated improved sensitivity in calling single nucleotide polymorphisms, and it provided comparable sensitivity and accuracy for calling insertions and deletions as compared to the GATK HaplotypeCaller. 16GT is available at https://github.com/aquaskyline/16GT.
Collapse
Affiliation(s)
- Ruibang Luo
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21218, USA
- Correspondence address: Center for Computational Biology, School of Medicine, Johns Hopkins University, 1900 E. Monument St. Rm 101B, Baltimore, MD 21205. Tel: 667-234-9641; E-mail:
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21218, USA
| | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21218, USA
- Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
40
|
Cissé OH, Ma L, Wei Huang D, Khil PP, Dekker JP, Kutty G, Bishop L, Liu Y, Deng X, Hauser PM, Pagni M, Hirsch V, Lempicki RA, Stajich JE, Cuomo CA, Kovacs JA. Comparative Population Genomics Analysis of the Mammalian Fungal Pathogen Pneumocystis. mBio 2018; 9:e00381-18. [PMID: 29739910 PMCID: PMC5941068 DOI: 10.1128/mbio.00381-18] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Accepted: 04/19/2018] [Indexed: 01/14/2023] Open
Abstract
Pneumocystis species are opportunistic mammalian pathogens that cause severe pneumonia in immunocompromised individuals. These fungi are highly host specific and uncultivable in vitro Human Pneumocystis infections present major challenges because of a limited therapeutic arsenal and the rise of drug resistance. To investigate the diversity and demographic history of natural populations of Pneumocystis infecting humans, rats, and mice, we performed whole-genome and large-scale multilocus sequencing of infected tissues collected in various geographic locations. Here, we detected reduced levels of recombination and variations in historical demography, which shape the global population structures. We report estimates of evolutionary rates, levels of genetic diversity, and population sizes. Molecular clock estimates indicate that Pneumocystis species diverged before their hosts, while the asynchronous timing of population declines suggests host shifts. Our results have uncovered complex patterns of genetic variation influenced by multiple factors that shaped the adaptation of Pneumocystis populations during their spread across mammals.IMPORTANCE Understanding how natural pathogen populations evolve and identifying the determinants of genetic variation are central issues in evolutionary biology. Pneumocystis, a fungal pathogen which infects mammals exclusively, provides opportunities to explore these issues. In humans, Pneumocystis can cause a life-threatening pneumonia in immunosuppressed individuals. In analysis of different Pneumocystis species infecting humans, rats, and mice, we found that there are high infection rates and that natural populations maintain a high level of genetic variation despite low levels of recombination. We found no evidence of population structuring by geography. Our comparisons of the times of divergence of these species to their respective hosts suggest that Pneumocystis may have undergone recent host shifts. The results demonstrate that Pneumocystis strains are widely disseminated geographically and provide a new understanding of the evolution of these pathogens.
Collapse
Affiliation(s)
- Ousmane H Cissé
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Liang Ma
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Da Wei Huang
- Lymphoid Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Pavel P Khil
- Department of Laboratory Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - John P Dekker
- Department of Laboratory Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Geetha Kutty
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Lisa Bishop
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Yueqin Liu
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Xilong Deng
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Philippe M Hauser
- Institute of Microbiology, Lausanne University Hospital, Lausanne, Switzerland
| | - Marco Pagni
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Vanessa Hirsch
- Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Disease, National Institutes of Health, Bethesda, Maryland, USA
| | - Richard A Lempicki
- Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA
| | - Jason E Stajich
- Department of Plant Pathology and Microbiology and Institute for Integrative Genome Biology, University of California, Riverside, Riverside, California, USA
| | - Christina A Cuomo
- Infectious Disease and Microbiome Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Joseph A Kovacs
- Critical Care Medicine Department, NIH Clinical Center, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
41
|
Yang R, Van Etten JL, Dehm SM. Indel detection from DNA and RNA sequencing data with transIndel. BMC Genomics 2018; 19:270. [PMID: 29673323 PMCID: PMC5909256 DOI: 10.1186/s12864-018-4671-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 04/13/2018] [Indexed: 12/18/2022] Open
Abstract
Background Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Results Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Conclusions Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology. Electronic supplementary material The online version of this article (10.1186/s12864-018-4671-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rendong Yang
- The Hormel Institute, University of Minnesota, 801 16th AVE NE, Austin, MN, 55912, USA. .,Masonic Cancer Center, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA.
| | - Jamie L Van Etten
- Masonic Cancer Center, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA
| | - Scott M Dehm
- Masonic Cancer Center, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA. .,Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, 55455, USA.
| |
Collapse
|
42
|
Franco I, Johansson A, Olsson K, Vrtačnik P, Lundin P, Helgadottir HT, Larsson M, Revêchon G, Bosia C, Pagnani A, Provero P, Gustafsson T, Fischer H, Eriksson M. Somatic mutagenesis in satellite cells associates with human skeletal muscle aging. Nat Commun 2018; 9:800. [PMID: 29476074 PMCID: PMC5824957 DOI: 10.1038/s41467-018-03244-6] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 01/26/2018] [Indexed: 01/06/2023] Open
Abstract
Human aging is associated with a decline in skeletal muscle (SkM) function and a reduction in the number and activity of satellite cells (SCs), the resident stem cells. To study the connection between SC aging and muscle impairment, we analyze the whole genome of single SC clones of the leg muscle vastus lateralis from healthy individuals of different ages (21–78 years). We find an accumulation rate of 13 somatic mutations per genome per year, consistent with proliferation of SCs in the healthy adult muscle. SkM-expressed genes are protected from mutations, but aging results in an increase in mutations in exons and promoters, targeting genes involved in SC activity and muscle function. In agreement with SC mutations affecting the whole tissue, we detect a missense mutation in a SC propagating to the muscle. Our results suggest somatic mutagenesis in SCs as a driving force in the age-related decline of SkM function. Aging skeletal muscle shows declining numbers and activity of satellite cells. Here, Franco et al. show that in satellite cells of the human leg muscle vastus lateralis, somatic mutations accumulate with age and that these mutations become enriched in exons and promoters of genes involved in muscle function.
Collapse
Affiliation(s)
- Irene Franco
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden.
| | - Anna Johansson
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 75237, Uppsala, Sweden
| | - Karl Olsson
- Division of Clinical Physiology, Department of Laboratory Medicine, Karolinska Institutet, 14186, Huddinge, Sweden
| | - Peter Vrtačnik
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden
| | - Pär Lundin
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden.,Science for Life Laboratory, Department of Biochemistry and Biophysics (DBB), Stockholm University, 10691, Stockholm, Sweden
| | - Hafdis T Helgadottir
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden
| | - Malin Larsson
- Science for Life Laboratory, Department of Physics, Chemistry and Biology, Linköping University, 58183, Linköping, Sweden
| | - Gwladys Revêchon
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden
| | - Carla Bosia
- Italian Institute for Genomic Medicine (IIGM), 10126, Turin, Italy.,Department of Applied Science and Technology, Politecnico di Torino, 10129, Turin, Italy
| | - Andrea Pagnani
- Italian Institute for Genomic Medicine (IIGM), 10126, Turin, Italy.,Department of Applied Science and Technology, Politecnico di Torino, 10129, Turin, Italy
| | - Paolo Provero
- Department of Molecular Biotechnology and Health Sciences, Molecular Biotechnology Center, 10126, Turin, Italy.,Center for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Thomas Gustafsson
- Division of Clinical Physiology, Department of Laboratory Medicine, Karolinska Institutet, 14186, Huddinge, Sweden
| | - Helene Fischer
- Division of Clinical Physiology, Department of Laboratory Medicine, Karolinska Institutet, 14186, Huddinge, Sweden
| | - Maria Eriksson
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden.
| |
Collapse
|
43
|
Hernandez-Rodriguez J, Arandjelovic M, Lester J, de Filippo C, Weihmann A, Meyer M, Angedakin S, Casals F, Navarro A, Vigilant L, Kühl HS, Langergraber K, Boesch C, Hughes D, Marques-Bonet T. The impact of endogenous content, replicates and pooling on genome capture from faecal samples. Mol Ecol Resour 2017; 18:319-333. [PMID: 29058768 PMCID: PMC5900898 DOI: 10.1111/1755-0998.12728] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 10/06/2017] [Accepted: 10/16/2017] [Indexed: 12/11/2022]
Abstract
Target-capture approach has improved over the past years, proving to be very efficient tool for selectively sequencing genetic regions of interest. These methods have also allowed the use of noninvasive samples such as faeces (characterized by their low quantity and quality of endogenous DNA) to be used in conservation genomic, evolution and population genetic studies. Here we aim to test different protocols and strategies for exome capture using the Roche SeqCap EZ Developer kit (57.5 Mb). First, we captured a complex pool of DNA libraries. Second, we assessed the influence of using more than one faecal sample, extract and/or library from the same individual, to evaluate its effect on the molecular complexity of the experiment. We validated our experiments with 18 chimpanzee faecal samples collected from two field sites as a part of the Pan African Programme: The Cultured Chimpanzee. Those two field sites are in Kibale National Park, Uganda (N = 9) and Loango National Park, Gabon (N = 9). We demonstrate that at least 16 libraries can be pooled, target enriched through hybridization, and sequenced allowing for the genotyping of 951,949 exome markers for population genetic analyses. Further, we observe that molecule richness, and thus, data acquisition, increase when using multiple libraries from the same extract or multiple extracts from the same sample. Finally, repeated captures significantly decrease the proportion of off-target reads from 34.15% after one capture round to 7.83% after two capture rounds, supporting our conclusion that two rounds of target enrichment are advisable when using complex faecal samples.
Collapse
Affiliation(s)
- Jessica Hernandez-Rodriguez
- Departament de Ciencies Experimentals i de la Salut, Institut de Biologia Evolutiva (Universitat Pompeu Fabra/CSIC), Barcelona, Spain
| | - Mimi Arandjelovic
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Jack Lester
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Cesare de Filippo
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Antje Weihmann
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Matthias Meyer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Samuel Angedakin
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Ferran Casals
- Genomics Core Facility, Departament de Ciencies Experimentals i de la Salut, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain
| | - Arcadi Navarro
- Departament de Ciencies Experimentals i de la Salut, Institut de Biologia Evolutiva (Universitat Pompeu Fabra/CSIC), Barcelona, Spain.,Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain.,Institucio Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Linda Vigilant
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Hjalmar S Kühl
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Leipzig-Jena, Leipzig, Germany
| | - Kevin Langergraber
- School of Human Evolution & Social Change, Arizona State University, Tempe, AZ, USA
| | - Christophe Boesch
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - David Hughes
- Departament de Ciencies Experimentals i de la Salut, Institut de Biologia Evolutiva (Universitat Pompeu Fabra/CSIC), Barcelona, Spain.,MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Tomas Marques-Bonet
- Departament de Ciencies Experimentals i de la Salut, Institut de Biologia Evolutiva (Universitat Pompeu Fabra/CSIC), Barcelona, Spain.,Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain.,Institucio Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
44
|
Sedlazeck FJ, Dhroso A, Bodian DL, Paschall J, Hermes F, Zook JM. Tools for annotation and comparison of structural variation. F1000Res 2017; 6:1795. [PMID: 29123647 PMCID: PMC5668921 DOI: 10.12688/f1000research.12516.1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/02/2017] [Indexed: 11/20/2022] Open
Abstract
The impact of structural variants (SVs) on a variety of organisms and diseases like cancer has become increasingly evident. Methods for SV detection when studying genomic differences across cells, individuals or populations are being actively developed. Currently, just a few methods are available to compare different SVs callsets, and no specialized methods are available to annotate SVs that account for the unique characteristics of these variant types. Here, we introduce SURVIVOR_ant, a tool that compares types and breakpoints for candidate SVs from different callsets and enables fast comparison of SVs to genomic features such as genes and repetitive regions, as well as to previously established SV datasets such as from the 1000 Genomes Project. As proof of concept we compared 16 SV callsets generated by different SV calling methods on a single genome, the Genome in a Bottle sample HG002 (Ashkenazi son), and annotated the SVs with gene annotations, 1000 Genomes Project SV calls, and four different types of repetitive regions. Computation time to annotate 134,528 SVs with 33,954 of annotations was 22 seconds on a laptop.
Collapse
Affiliation(s)
- Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Andi Dhroso
- Worcester Polytechnic Institute, Worcester, MA, USA
| | - Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System, Falls Church, VA, USA
| | | | | | - Justin M Zook
- Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
45
|
Wu L, Yavas G, Hong H, Tong W, Xiao W. Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches. Sci Rep 2017; 7:10963. [PMID: 28887485 PMCID: PMC5591230 DOI: 10.1038/s41598-017-10826-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 08/15/2017] [Indexed: 12/30/2022] Open
Abstract
Complementary to reference-based variant detection, recent studies revealed that many novel variants could be detected with de novo assembled genomes. To evaluate the effect of reads coverage and the accuracy of assembly-based variant calling, we simulated short reads containing more than 3 million of single nucleotide variants (SNVs) from the whole human genome and compared the efficiency of SNV calling between the assembly-based and alignment-based calling approaches. We assessed the quality of the assembled contig and found that a minimum of 30X coverage of short reads was needed to ensure reliable SNV calling and to generate assembled contigs with a good coverage of genome and genes. In addition, we observed that the assembly-based approach had a much lower recall rate and precision comparing to the alignment-based approach that would recover 99% of imputed SNVs. We observed similar results with experimental reads for NA24385, an individual whose germline variants were well characterized. Although there are additional values for SNVs detection, the assembly-based approach would have great risk of false discovery of novel SNVs. Further improvement of de novo assembly algorithms are needed in order to warrant a good completeness of genome with haplotype resolved and high fidelity of assembled sequences.
Collapse
Affiliation(s)
- Leihong Wu
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
| | - Gokhan Yavas
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA
| | - Wenming Xiao
- National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR, 72079, USA.
| |
Collapse
|
46
|
Dolzhenko E, van Vugt JJFA, Shaw RJ, Bekritsky MA, van Blitterswijk M, Narzisi G, Ajay SS, Rajan V, Lajoie BR, Johnson NH, Kingsbury Z, Humphray SJ, Schellevis RD, Brands WJ, Baker M, Rademakers R, Kooyman M, Tazelaar GHP, van Es MA, McLaughlin R, Sproviero W, Shatunov A, Jones A, Al Khleifat A, Pittman A, Morgan S, Hardiman O, Al-Chalabi A, Shaw C, Smith B, Neo EJ, Morrison K, Shaw PJ, Reeves C, Winterkorn L, Wexler NS, Housman DE, Ng CW, Li AL, Taft RJ, van den Berg LH, Bentley DR, Veldink JH, Eberle MA. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res 2017; 27:1895-1903. [PMID: 28887402 PMCID: PMC5668946 DOI: 10.1101/gr.225672.117] [Citation(s) in RCA: 262] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 08/28/2017] [Indexed: 12/14/2022]
Abstract
Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.
Collapse
Affiliation(s)
| | - Joke J F A van Vugt
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands
| | - Richard J Shaw
- Illumina Limited, Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex, CB10 1XL, United Kingdom.,Repositive Limited, Future Business Centre, Cambridge CB4 2HY, United Kingdom
| | - Mitchell A Bekritsky
- Illumina Limited, Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex, CB10 1XL, United Kingdom
| | | | | | | | - Vani Rajan
- Illumina Incorporated, San Diego, California 92122, USA
| | | | | | - Zoya Kingsbury
- Illumina Limited, Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex, CB10 1XL, United Kingdom
| | - Sean J Humphray
- Illumina Limited, Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex, CB10 1XL, United Kingdom
| | - Raymond D Schellevis
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands
| | - William J Brands
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands
| | - Matt Baker
- Department of Neuroscience, Mayo Clinic, Jacksonville, Florida 32224, USA
| | - Rosa Rademakers
- Department of Neuroscience, Mayo Clinic, Jacksonville, Florida 32224, USA
| | | | - Gijs H P Tazelaar
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands
| | - Michael A van Es
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands
| | - Russell McLaughlin
- Academic Unit of Neurology, Trinity College Dublin, Trinity Biomedical Sciences Institute, Dublin 2, Republic of Ireland.,Department of Neurology, Beaumont Hospital, Dublin 9, Republic of Ireland
| | - William Sproviero
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom
| | - Aleksey Shatunov
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom
| | - Ashley Jones
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom
| | - Alan Pittman
- Department of Molecular Neuroscience, UCL Institute of Neurology, London WC1N 3BG, United Kingdom
| | - Sarah Morgan
- Department of Molecular Neuroscience, UCL Institute of Neurology, London WC1N 3BG, United Kingdom
| | - Orla Hardiman
- Academic Unit of Neurology, Trinity College Dublin, Trinity Biomedical Sciences Institute, Dublin 2, Republic of Ireland.,Department of Neurology, Beaumont Hospital, Dublin 9, Republic of Ireland
| | - Ammar Al-Chalabi
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom
| | - Chris Shaw
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom
| | - Bradley Smith
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom
| | - Edmund J Neo
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom
| | - Karen Morrison
- University of Southampton, Southampton SO17 1BJ, United Kingdom
| | - Pamela J Shaw
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield S10 2HQ, United Kingdom
| | | | | | - Nancy S Wexler
- Columbia University, New York, New York 10032, USA.,Hereditary Disease Foundation, New York, New York 10032, USA
| | | | - David E Housman
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Christopher W Ng
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Alina L Li
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Ryan J Taft
- Illumina Incorporated, San Diego, California 92122, USA
| | - Leonard H van den Berg
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands
| | - David R Bentley
- Illumina Limited, Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex, CB10 1XL, United Kingdom
| | - Jan H Veldink
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands
| | | |
Collapse
|
47
|
Choi YJ, Bisset SA, Doyle SR, Hallsworth-Pepin K, Martin J, Grant WN, Mitreva M. Genomic introgression mapping of field-derived multiple-anthelmintic resistance in Teladorsagia circumcincta. PLoS Genet 2017. [PMID: 28644839 PMCID: PMC5507320 DOI: 10.1371/journal.pgen.1006857] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Preventive chemotherapy has long been practiced against nematode parasites of livestock, leading to widespread drug resistance, and is increasingly being adopted for eradication of human parasitic nematodes even though it is similarly likely to lead to drug resistance. Given that the genetic architecture of resistance is poorly understood for any nematode, we have analyzed multidrug resistant Teladorsagia circumcincta, a major parasite of sheep, as a model for analysis of resistance selection. We introgressed a field-derived multiresistant genotype into a partially inbred susceptible genetic background (through repeated backcrossing and drug selection) and performed genome-wide scans in the backcross progeny and drug-selected F2 populations to identify the major genes responsible for the multidrug resistance. We identified variation linking candidate resistance genes to each drug class. Putative mechanisms included target site polymorphism, changes in likely regulatory regions and copy number variation in efflux transporters. This work elucidates the genetic architecture of multiple anthelmintic resistance in a parasitic nematode for the first time and establishes a framework for future studies of anthelmintic resistance in nematode parasites of humans. Teladorsagia circumcincta is an economically significant nematode (roundworm) pathogen affecting sheep and goats in temperate regions of the world. The widespread use of prophylactic treatment has resulted in rapid selection for anthelmintic (anti-worm drug) resistance in this and other species of livestock parasites. The mechanism of resistance is not well understood because most studies have focused on the role of candidate genes using simplistic models of single gene selection, despite evidence that the evolution of resistance is more complex. Here, we report on a comprehensive whole-genome analysis that elucidated resistance-associated genes, which was facilitated by developing a pair of T. circumcincta strains sharing a largely common genetic background but differing markedly in their susceptibility to anthelmintic drugs. The results show that multiple genetic factors contribute to anthelmintic resistance in a variety of ways, including possible reduction/modulation in target site sensitivity, reduced target site expression, and increased drug efflux, to name a few. This suggests that drug resistance in these parasites is a multifactorial quantitative trait rather than a simple discrete Mendelian character. With this study, we established a genomics-based experimental paradigm for investigating anthelmintic resistance, at a time when its medical importance is rapidly increasing.
Collapse
Affiliation(s)
- Young-Jun Choi
- McDonnell Genome Institute, Washington University School of Medicine, Saint Louis, Missouri, United States of America
| | - Stewart A Bisset
- AgResearch, Hopkirk Research Institute, Palmerston North, New Zealand
| | - Stephen R Doyle
- Department of Animal, Plant and Soil Sciences, La Trobe University, Melbourne, Victoria, Australia
| | - Kymberlie Hallsworth-Pepin
- McDonnell Genome Institute, Washington University School of Medicine, Saint Louis, Missouri, United States of America
| | - John Martin
- McDonnell Genome Institute, Washington University School of Medicine, Saint Louis, Missouri, United States of America
| | - Warwick N Grant
- Department of Animal, Plant and Soil Sciences, La Trobe University, Melbourne, Victoria, Australia
| | - Makedonka Mitreva
- McDonnell Genome Institute, Washington University School of Medicine, Saint Louis, Missouri, United States of America.,Department of Medicine, Washington University School of Medicine, Saint Louis, Missouri, United States of America
| |
Collapse
|
48
|
Eisfeldt J, Vezzi F, Olason P, Nilsson D, Lindstrand A. TIDDIT, an efficient and comprehensive structural variant caller for massive parallel sequencing data. F1000Res 2017; 6:664. [PMID: 28781756 DOI: 10.12688/f1000research.11168.1] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/05/2017] [Indexed: 01/07/2023] Open
Abstract
Reliable detection of large structural variation ( > 1000 bp) is important in both rare and common genetic disorders. Whole genome sequencing (WGS) is a technology that may be used to identify a large proportion of the genomic structural variants (SVs) in an individual in a single experiment. Even though SV callers have been extensively used in research to detect mutations, the potential usage of SV callers within routine clinical diagnostics is still limited. One well known, but not well-addressed problem is the large number of benign variants and reference errors present in the human genome that further complicates analysis. Even though there is a wide range of SV-callers available, the number of callers that allow detection of the entire spectra of SV at a low computational cost is still relatively limited.
Collapse
Affiliation(s)
- Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, 171 21 Solna, Sweden
| | - Francesco Vezzi
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 21 Solna, Sweden.,Department of Biochemistry and Biophysics, Stockholm University, 171 21 Stockholm, Sweden
| | - Pall Olason
- Science for Life Laboratory, Dept of Cell and Molecular Biology, Uppsala University, Husargatan 3, Uppsala, SE-752 37, Sweden
| | - Daniel Nilsson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, 171 21 Solna, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| |
Collapse
|
49
|
Eisfeldt J, Vezzi F, Olason P, Nilsson D, Lindstrand A. TIDDIT, an efficient and comprehensive structural variant caller for massive parallel sequencing data. F1000Res 2017; 6:664. [PMID: 28781756 PMCID: PMC5521161 DOI: 10.12688/f1000research.11168.2] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/28/2017] [Indexed: 01/25/2023] Open
Abstract
Reliable detection of large structural variation ( > 1000 bp) is important in both rare and common genetic disorders. Whole genome sequencing (WGS) is a technology that may be used to identify a large proportion of the genomic structural variants (SVs) in an individual in a single experiment. Even though SV callers have been extensively used in research to detect mutations, the potential usage of SV callers within routine clinical diagnostics is still limited. One well known, but not well-addressed problem is the large number of benign variants and reference errors present in the human genome that further complicates analysis. Even though there is a wide range of SV-callers available, the number of callers that allow detection of the entire spectra of SV at a low computational cost is still relatively limited.
Collapse
Affiliation(s)
- Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, 171 21 Solna, Sweden
| | - Francesco Vezzi
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 21 Solna, Sweden.,Department of Biochemistry and Biophysics, Stockholm University, 171 21 Stockholm, Sweden
| | - Pall Olason
- Science for Life Laboratory, Dept of Cell and Molecular Biology, Uppsala University, Husargatan 3, Uppsala, SE-752 37, Sweden
| | - Daniel Nilsson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, 171 21 Solna, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| |
Collapse
|
50
|
Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res 2017; 27:768-777. [PMID: 28232478 PMCID: PMC5411771 DOI: 10.1101/gr.214346.116] [Citation(s) in RCA: 413] [Impact Index Per Article: 51.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2016] [Accepted: 02/14/2017] [Indexed: 01/19/2023]
Abstract
The assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps toward elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depend on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely. With ABySS 1.0, we originally showed that assembling the human genome using short 50-bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its redesign, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements. We benchmarked ABySS 2.0 human genome assembly using a Genome in a Bottle data set of 250-bp Illumina paired-end and 6-kbp mate-pair libraries from a single individual. Our assembly yielded a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using <35 GB of RAM. This is a modest memory requirement by today's standards and is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics' Chromium data to further improve the scaffold NG50 (NGA50) of this assembly to 42 (15) Mbp.
Collapse
Affiliation(s)
- Shaun D Jackman
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Benjamin P Vandervalk
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Hamid Mohamadi
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Justin Chu
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Sarah Yeo
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - S Austin Hammond
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Golnaz Jahesh
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Hamza Khan
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Rene L Warren
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
| |
Collapse
|