1
|
Corradi Z, Dhaenens CM, Grunewald O, Kocabaş IS, Meunier I, Banfi S, Karali M, Cremers FPM, Hitti-Malin RJ. Novel and Recurrent Copy Number Variants in ABCA4-Associated Retinopathy. Int J Mol Sci 2024; 25:5940. [PMID: 38892127 PMCID: PMC11173210 DOI: 10.3390/ijms25115940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
ABCA4 is the most frequently mutated gene leading to inherited retinal disease (IRD) with over 2200 pathogenic variants reported to date. Of these, ~1% are copy number variants (CNVs) involving the deletion or duplication of genomic regions, typically >50 nucleotides in length. An in-depth assessment of the current literature based on the public database LOVD, regarding the presence of known CNVs and structural variants in ABCA4, and additional sequencing analysis of ABCA4 using single-molecule Molecular Inversion Probes (smMIPs) for 148 probands highlighted recurrent and novel CNVs associated with ABCA4-associated retinopathies. An analysis of the coverage depth in the sequencing data led to the identification of eleven deletions (six novel and five recurrent), three duplications (one novel and two recurrent) and one complex CNV. Of particular interest was the identification of a complex defect, i.e., a 15.3 kb duplicated segment encompassing exon 31 through intron 41 that was inserted at the junction of a downstream 2.7 kb deletion encompassing intron 44 through intron 47. In addition, we identified a 7.0 kb tandem duplication of intron 1 in three cases. The identification of CNVs in ABCA4 can provide patients and their families with a genetic diagnosis whilst expanding our understanding of the complexity of diseases caused by ABCA4 variants.
Collapse
Affiliation(s)
- Zelia Corradi
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Claire-Marie Dhaenens
- Université de Lille, Inserm, CHU Lille, U1172-LilNCog-Lille Neuroscience & Cognition, F-59000 Lille, France
| | - Olivier Grunewald
- Université de Lille, Inserm, CHU Lille, U1172-LilNCog-Lille Neuroscience & Cognition, F-59000 Lille, France
| | - Ipek Selen Kocabaş
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Isabelle Meunier
- Institute des Neurosciences de Montpellier, INSERM, Université de Montpellier, F-34295 Montpellier, France
| | - Sandro Banfi
- Department of Precision Medicine, University of Campania “Luigi Vanvitelli”, 81031 Naples, Italy
- Telethon Institute of Genetics and Medicine (TIGEM), 80078 Pozzuoli, Italy
| | - Marianthi Karali
- Department of Precision Medicine, University of Campania “Luigi Vanvitelli”, 81031 Naples, Italy
- Eye Clinic, Multidisciplinary Department of Medical, Surgical and Dental Sciences, University of Campania “Luigi Vanvitelli”, 81031 Naples, Italy
| | - Frans P. M. Cremers
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Rebekkah J. Hitti-Malin
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| |
Collapse
|
2
|
Lesack KJ, Wasmuth JD. The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data. PeerJ 2024; 12:e17101. [PMID: 38500526 PMCID: PMC10946394 DOI: 10.7717/peerj.17101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 02/21/2024] [Indexed: 03/20/2024] Open
Abstract
Background Structural variant (SV) calling from DNA sequencing data has been challenging due to several factors, including the ambiguity of short-read alignments, multiple complex SVs in the same genomic region, and the lack of "truth" datasets for benchmarking. Additionally, caller choice, parameter settings, and alignment method are known to affect SV calling. However, the impact of FASTQ read order on SV calling has not been explored for long-read data. Results Here, we used PacBio DNA sequencing data from 15 Caenorhabditis elegans strains and four Arabidopsis thaliana ecotypes to evaluate the sensitivity of different SV callers on FASTQ read order. Comparisons of variant call format files generated from the original and permutated FASTQ files demonstrated that the order of input data affected the SVs predicted by each caller. In particular, pbsv was highly sensitive to the order of the input data, especially at the highest depths where over 70% of the SV calls generated from pairs of differently ordered FASTQ files were in disagreement. These demonstrate that read order sensitivity is a complex, multifactorial process, as the differences observed both within and between species varied considerably according to the specific combination of aligner, SV caller, and sequencing depth. In addition to the SV callers being sensitive to the input data order, the SAMtools alignment sorting algorithm was identified as a source of variability following read order randomization. Conclusion The results of this study highlight the sensitivity of SV calling on the order of reads encoded in FASTQ files, which has not been recognized in long-read approaches. These findings have implications for the replication of SV studies and the development of consistent SV calling protocols. Our study suggests that researchers should pay attention to the input order sensitivity of read alignment sorting methods when analyzing long-read sequencing data for SV calling, as mitigating a source of variability could facilitate future replication work. These results also raise important questions surrounding the relationship between SV caller read order sensitivity and tool performance. Therefore, tool developers should also consider input order sensitivity as a potential source of variability during the development and benchmarking of new and improved methods for SV calling.
Collapse
Affiliation(s)
- Kyle J. Lesack
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
3
|
Salava H, Deák T, Czepe C, Maghuly F. Sample and Library Preparation for PacBio Long-Read Sequencing in Grapevine. Methods Mol Biol 2024; 2787:183-197. [PMID: 38656490 DOI: 10.1007/978-1-0716-3778-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
PacBio long-read sequencing is a third-generation technology that generates long reads up to 20 kilobases (kb), unlike short-read sequencing instruments that produce up to 600 bases. Long-read sequencing is particularly advantageous in higher organisms, such as humans and plants, where repetitive regions in the genome are more abundant. The PacBio long-read sequencing uses a single molecule, real-time approach where the SMRT cells contain several zero-mode waveguides (ZMWs). Each ZMW contains a single DNA molecule bound by a DNA polymerase. All ZMWs are flushed with deoxy nucleotides with a fluorophore specific to each nucleotide. As the sequencing proceeds, the detector detects the wavelength of the fluorescence and the nucleotides are read in real-time. This chapter describes the sample and library preparation for PacBio long-read sequencing for grapevine.
Collapse
Affiliation(s)
- Hymavathi Salava
- Plant Functional Genomics Lab, Institute of Molecular Biotechnology, Department of Biotechnology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Tamás Deák
- Institute of Viticulture and Oenology, Hungarian University of Agriculture and Life Sciences (MATE), Budapest, Hungary
| | - Carmen Czepe
- Next Generation Sequencing Unit, Vienna Biocenter Core Facilities (VBCF), Vienna, Austria
| | - Fatemeh Maghuly
- Plant Functional Genomics Lab, Institute of Molecular Biotechnology, Department of Biotechnology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| |
Collapse
|
4
|
Lei YQ, Xu LP, Cao H, Wang XR. A method of large DNA fragment enrichment for nanopore sequencing in region 22q11.2. Front Genet 2022; 13:959883. [DOI: 10.3389/fgene.2022.959883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 10/11/2022] [Indexed: 11/13/2022] Open
Abstract
Background: 22q11.2 deletion syndrome (22q11.2DS) is a disorder caused when a small part of chromosome 22 is missing. Diagnosis is currently established by the identification of a heterozygous deletion at chromosome 22q11.2 through chromosomal microarray analysis or other genomic analyses. However, more accurate identification of the breakpoint contributes to a clearer understanding of the 22q11.2 deletion syndrome.Methods: In this study, we present a feasible nanopore sequencing method of 22q11.2 deletion. This DNA enrichment method—region-specific amplification (RSA)—is able to analyze the 22q11.2 deletion by specific amplification of an approximately 1-Mb region where the breakpoint might exist. RSA introduces universal primers into the target region DNA by a Y-shaped adaptor ligation and a single primer extension. The enriched products, completed by amplification with universal primers, are then processed by standard ONT ligation sequencing protocols.Results: RSA is able to deliver adequate coverage (>98%) and comparable long reads (average length >1 Kb) throughout the 22q11.2 region. The long nanopore sequencing reads, derived from three umbilical cord blood samples, have facilitated the identification of the breakpoint of the 22q11.2 deletion, as well as by Sanger sequencing.Conclusion: The Oxford Nanopore MinION sequencer can use RSA to sequence the target region 22q11.2; this method could also be used for other hard-to-sequence parts of the genome.
Collapse
|
5
|
Uppuluri L, Wang Y, Young E, Wong JS, Abid HZ, Xiao M. Multiplex structural variant detection by whole-genome mapping and nanopore sequencing. Sci Rep 2022; 12:6512. [PMID: 35444207 PMCID: PMC9021263 DOI: 10.1038/s41598-022-10483-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 04/08/2022] [Indexed: 11/26/2022] Open
Abstract
Identification of structural variants (SVs) breakpoints is important in studying mutations, mutagenic causes, and functional impacts. Next-generation sequencing and whole-genome optical mapping are extensively used in SV discovery and characterization. However, multiple platforms and computational approaches are needed for comprehensive analysis, making it resource-intensive and expensive. Here, we propose a strategy combining optical mapping and cas9-assisted targeted nanopore sequencing to analyze SVs. Optical mapping can economically and quickly detect SVs across a whole genome but does not provide sequence-level information or precisely resolve breakpoints. Furthermore, since only a subset of all SVs is known to affect biology, we attempted to type a subset of all SVs using targeted nanopore sequencing. Using our approach, we resolved the breakpoints of five deletions, five insertions, and an inversion, in a single experiment.
Collapse
Affiliation(s)
- Lahari Uppuluri
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA.,Department of Mechanical Engineering and Mechanics, Drexel University, Philadelphia, PA, USA
| | - Yilin Wang
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Eleanor Young
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Jessica S Wong
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Heba Z Abid
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Ming Xiao
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA. .,Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, USA.
| |
Collapse
|
6
|
Chen Z, He X. Application of third-generation sequencing in cancer research. MEDICAL REVIEW (BERLIN, GERMANY) 2021; 1:150-171. [PMID: 37724303 PMCID: PMC10388785 DOI: 10.1515/mr-2021-0013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/09/2021] [Indexed: 09/20/2023]
Abstract
In the past several years, nanopore sequencing technology from Oxford Nanopore Technologies (ONT) and single-molecule real-time (SMRT) sequencing technology from Pacific BioSciences (PacBio) have become available to researchers and are currently being tested for cancer research. These methods offer many advantages over most widely used high-throughput short-read sequencing approaches and allow the comprehensive analysis of transcriptomes by identifying full-length splice isoforms and several other posttranscriptional events. In addition, these platforms enable structural variation characterization at a previously unparalleled resolution and direct detection of epigenetic marks in native DNA and RNA. Here, we present a comprehensive summary of important applications of these technologies in cancer research, including the identification of complex structure variants, alternatively spliced isoforms, fusion transcript events, and exogenous RNA. Furthermore, we discuss the impact of the newly developed nanopore direct RNA sequencing (RNA-Seq) approach in advancing epitranscriptome research in cancer. Although the unique challenges still present for these new single-molecule long-read methods, they will unravel many aspects of cancer genome complexity in unprecedented ways and present an encouraging outlook for continued application in an increasing number of different cancer research settings.
Collapse
Affiliation(s)
- Zhiao Chen
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xianghuo He
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China
| |
Collapse
|
7
|
Savara J, Novosád T, Gajdoš P, Kriegová E. Comparison of structural variants detected by optical mapping with long-read next-generation sequencing. Bioinformatics 2021; 37:3398-3404. [PMID: 33983367 DOI: 10.1093/bioinformatics/btab359] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/21/2021] [Accepted: 05/08/2021] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Recent studies have shown the potential of using long-read whole-genome sequencing (WGS) approaches and optical mapping (OM) for the detection of clinically relevant structural variants (SVs) in cancer research. Three main long-read WGS platforms are currently in use: Pacific Biosciences (PacBio), Oxford Nanopore Technologies (ONT) and 10x Genomics. Recently, whole-genome OM technology (Bionano Genomics) has been introduced into human diagnostics. Questions remain about the accuracy of these long-read sequencing platforms, how comparable/interchangeable they are when searching for SVs and to what extent they can be replaced or supplemented by OM. Moreover, no tool can effectively compare SVs obtained by OM and WGS. RESULTS This study compared optical maps of the breast cancer cell line SKBR3 with AnnotSV outputs from WGS platforms. For this purpose, a software tool with comparative and filtering features was developed. The majority of SVs up to a 50 kbp distance variance threshold found by OM were confirmed by all WGS platforms, and 99% of translocations and 80% of deletions found by OM were confirmed by both PacBio and ONT, with ∼70% being confirmed by 10x Genomics in combination with PacBio and/or ONT. Interestingly, long deletions (>100 kbp) were detected only by 10x Genomics. Regarding insertions, ∼72% was confirmed by PacBio and ONT, but none by 10x Genomics. Inversions and duplications detected by OM were not detected by WGS. Moreover, the tool enabled the confirmation of SVs that overlapped in the same gene(s) and was applied to the filtering of disease-associated SVs. AVAILABILITY https://github.com/novosadt/om-annotsv-svc.
Collapse
Affiliation(s)
- Jakub Savara
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
- Department of Immunology, Faculty of Medicine and Dentistry, Palacký University in Olomouc and University Hospital Olomouc, 779 00, Olomouc, Czech Republic
| | - Tomáš Novosád
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
| | - Petr Gajdoš
- Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, 708 00, Czech Republic
| | - Eva Kriegová
- Department of Immunology, Faculty of Medicine and Dentistry, Palacký University in Olomouc and University Hospital Olomouc, 779 00, Olomouc, Czech Republic
| |
Collapse
|
8
|
Chen L, Pryce JE, Hayes BJ, Daetwyler HD. Investigating the Effect of Imputed Structural Variants from Whole-Genome Sequence on Genome-Wide Association and Genomic Prediction in Dairy Cattle. Animals (Basel) 2021; 11:ani11020541. [PMID: 33669735 PMCID: PMC7922624 DOI: 10.3390/ani11020541] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 02/09/2021] [Accepted: 02/12/2021] [Indexed: 02/06/2023] Open
Abstract
Simple Summary Structural variants are large changes to the DNA sequences that differ from individual to individual. We discovered and quality-controlled a set of 24,908 structural variants and used a technique called imputation to infer them into 35,588 Holstein and Jersey cattle. We then investigated whether the structural variants affected key dairy cattle traits such as milk production, fertility and overall conformation. Structural variants explained generally less than 10 percent of the phenotypic variation in these traits. Four of the structural variants were significantly associated with dairy cattle production traits. However, the inclusion of the structural variants in the genomic prediction model did not increase genomic prediction accuracy. Abstract Structural variations (SVs) are large DNA segments of deletions, duplications, copy number variations, inversions and translocations in a re-sequenced genome compared to a reference genome. They have been found to be associated with several complex traits in dairy cattle and could potentially help to improve genomic prediction accuracy of dairy traits. Imputation of SVs was performed in individuals genotyped with single-nucleotide polymorphism (SNP) panels without the expense of sequencing them. In this study, we generated 24,908 high-quality SVs in a total of 478 whole-genome sequenced Holstein and Jersey cattle. We imputed 4489 SVs with R2 > 0.5 into 35,568 Holstein and Jersey dairy cattle with 578,999 SNPs with two pipelines, FImpute and Eagle2.3-Minimac3. Genome-wide association studies for production, fertility and overall type with these 4489 SVs revealed four significant SVs, of which two were highly linked to significant SNP. We also estimated the variance components for SNP and SV models for these traits using genomic best linear unbiased prediction (GBLUP). Furthermore, we assessed the effect on genomic prediction accuracy of adding SVs to GBLUP models. The estimated percentage of genetic variance captured by SVs for production traits was up to 4.57% for milk yield in bulls and 3.53% for protein yield in cows. Finally, no consistent increase in genomic prediction accuracy was observed when including SVs in GBLUP.
Collapse
Affiliation(s)
- Long Chen
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Jennie E. Pryce
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Ben J. Hayes
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, St. Lucia, QLD 4067, Australia
| | - Hans D. Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia; (L.C.); (J.E.P.); (B.J.H.)
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
- Correspondence:
| |
Collapse
|
9
|
Blondal T, Gamba C, Møller Jagd L, Su L, Demirov D, Guo S, Johnston CM, Riising EM, Wu X, Mikkelsen MJ, Szabova L, Mouritzen P. Verification of CRISPR editing and finding transgenic inserts by Xdrop indirect sequence capture followed by short- and long-read sequencing. Methods 2021; 191:68-77. [PMID: 33582298 DOI: 10.1016/j.ymeth.2021.02.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 11/12/2020] [Accepted: 02/02/2021] [Indexed: 01/02/2023] Open
Abstract
Validation of CRISPR-Cas9 editing typically explores the immediate vicinity of the gene editing site and distal off-target sequences, which has led to the conclusion that CRISPR-Cas9 editing is very specific. However, an increasing number of studies suggest that on-target unintended editing events like deletions and insertions are relatively frequent but unfortunately often missed in the validation of CRISPR-Cas9 editing. The deletions may be several kilobases-long and only affect one allele. The gold standard in molecular validation of gene editing is direct sequencing of relatively short PCR amplicons. This approach allows the detection of small editing events but fails in detecting large rearrangements, in particular when only one allele is affected. Detection of large rearrangements requires that an extended region is analyzed and the characterization of events may benefit from long-read sequencing. Here we implemented Xdrop™, a new microfluidic technology that allows targeted enrichment of long regions (~100 kb) using just a single standard PCR primer set. Sequencing of the enriched CRISPR-Cas9 gene-edited region in four cell lines on long- and short-read sequencing platforms unravelled unknown and unintended genome editing events. The analysis revealed accidental kilobases-large insertions in three of the cell lines, which remained undetected using standard procedures. We also applied the targeted enrichment approach to identify the integration site of a transgene in a mouse line. The results demonstrate the potential of this technology in gene editing validation as well as in more classic transgenics.
Collapse
Affiliation(s)
| | | | | | - Ling Su
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Dimiter Demirov
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Shuang Guo
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | - Xiaolin Wu
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Ludmila Szabova
- Center for Advanced Preclinical Research, Frederick National Laboratory for Cancer Research at the National Cancer Institute-Frederick, Frederick, MD, USA
| | | |
Collapse
|
10
|
Aganezov S, Raphael BJ. Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples. Genome Res 2020; 30:1274-1290. [PMID: 32887685 PMCID: PMC7545144 DOI: 10.1101/gr.256701.119] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 08/07/2020] [Indexed: 12/25/2022]
Abstract
Many cancer genomes are extensively rearranged with aberrant chromosomal karyotypes. Deriving these karyotypes from high-throughput DNA sequencing of bulk tumor samples is complicated because most tumors are a heterogeneous mixture of normal cells and subpopulations of cancer cells, or clones, that harbor distinct somatic mutations. We introduce a new algorithm, Reconstructing Cancer Karyotypes (RCK), to reconstruct haplotype-specific karyotypes of one or more rearranged cancer genomes from DNA sequencing data from a bulk tumor sample. RCK leverages evolutionary constraints on the somatic mutational process in cancer to reduce ambiguity in the deconvolution of admixed sequencing data into multiple haplotype-specific cancer karyotypes. RCK models mixtures containing an arbitrary number of derived genomes and allows the incorporation of information both from short-read and long-read DNA sequencing technologies. We compare RCK to existing approaches on 17 primary and metastatic prostate cancer samples. We find that RCK infers cancer karyotypes that better explain the DNA sequencing data and conform to a reasonable evolutionary model. RCK's reconstructions of clone- and haplotype-specific karyotypes will aid further studies of the role of intra-tumor heterogeneity in cancer development and response to treatment. RCK is freely available as open source software.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| |
Collapse
|
11
|
Yang L. A Practical Guide for Structural Variation Detection in the Human Genome. CURRENT PROTOCOLS IN HUMAN GENETICS 2020; 107:e103. [PMID: 32813322 PMCID: PMC7738216 DOI: 10.1002/cphg.103] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Profiling genetic variants-including single nucleotide variants, small insertions and deletions, copy number variations, and structural variations (SVs)-from both healthy individuals and individuals with disease is a key component of genetic and biomedical research. SVs are large-scale changes in the genome and involve breakage and rejoining of DNA fragments. They may affect thousands to millions of nucleotides and can lead to loss, gain, and reshuffling of genes and regulatory elements. SVs are known to impact gene expression and potentially result in altered phenotypes and diseases. Therefore, identifying SVs from the human genomes is particularly important. In this review, I describe advantages and disadvantages of the available high-throughput assays for the discovery of SVs, which are the most challenging genetic alterations to detect. A practical guide is offered to suggest the most suitable strategies for discovering different types of SVs including common germline, rare, somatic, and complex variants. I also discuss factors to be considered, such as cost and performance, for different strategies when designing experiments. Last, I present several approaches to identify potential SV artifacts caused by samples, experimental procedures, and computational analysis. © 2020 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Lixing Yang
- Ben May Department for Cancer Research, Department of Human Genetics, University of Chicago, Chicago, Illinois
| |
Collapse
|
12
|
Karaoğlanoğlu F, Ricketts C, Ebren E, Rasekh ME, Hajirasouliha I, Alkan C. VALOR2: characterization of large-scale structural variants using linked-reads. Genome Biol 2020; 21:72. [PMID: 32192518 PMCID: PMC7083023 DOI: 10.1186/s13059-020-01975-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 02/24/2020] [Indexed: 12/31/2022] Open
Abstract
Most existing methods for structural variant detection focus on discovery and genotyping of deletions, insertions, and mobile elements. Detection of balanced structural variants with no gain or loss of genomic segments, for example, inversions and translocations, is a particularly challenging task. Furthermore, there are very few algorithms to predict the insertion locus of large interspersed segmental duplications and characterize translocations. Here, we propose novel algorithms to characterize large interspersed segmental duplications, inversions, deletions, and translocations using linked-read sequencing data. We redesign our earlier algorithm, VALOR, and implement our new algorithms in a new software package, called VALOR2.
Collapse
Affiliation(s)
- Fatih Karaoğlanoğlu
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
| | - Camir Ricketts
- Tri-Institutional Computational Biology & Medicine Program, Cornell University, 1300 York Ave, New York, 10065 NY USA
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
| | - Ezgi Ebren
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
| | - Marzieh Eslami Rasekh
- Graduate Program in Bioinformatics, Boston University, 24 Cummington Mall, Boston, 02215 MA USA
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, 1300 York Ave, New York, 10065 NY USA
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, 06800 Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Bilkent University, Ankara, 06800 Turkey
| |
Collapse
|
13
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
14
|
Aganezov S, Zban I, Aksenov V, Alexeev N, Schatz MC. Recovering rearranged cancer chromosomes from karyotype graphs. BMC Bioinformatics 2019; 20:641. [PMID: 31842730 PMCID: PMC6915857 DOI: 10.1186/s12859-019-3208-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Many cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale variations into a karyotype graph representation of the rearranged cancer genomes. Such a representation, however, does not directly describe the linear and/or circular structure of the underlying rearranged cancer chromosomes, thus limiting possible analysis of cancer genomes somatic evolutionary process as well as functional genomic changes brought by the large-scale genome rearrangements. RESULTS Here we address the aforementioned limitation by introducing a novel methodological framework for recovering rearranged cancer chromosomes from karyotype graphs. For a cancer karyotype graph we formulate an Eulerian Decomposition Problem (EDP) of finding a collection of linear and/or circular rearranged cancer chromosomes that are determined by the graph. We derive and prove computational complexities for several variations of the EDP. We then demonstrate that Eulerian decomposition of the cancer karyotype graphs is not always unique and present the Consistent Contig Covering Problem (CCCP) of recovering unambiguous cancer contigs from the cancer karyotype graph, and describe a novel algorithm CCR capable of solving CCCP in polynomial time. We apply CCR on a prostate cancer dataset and demonstrate that it is capable of consistently recovering large cancer contigs even when underlying cancer genomes are highly rearranged. CONCLUSIONS CCR can recover rearranged cancer contigs from karyotype graphs thereby addressing existing limitation in inferring chromosomal structures of rearranged cancer genomes and advancing our understanding of both patient/cancer-specific as well as the overall genetic instability in cancer.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, 3400 N. Charles st., Baltimore, 21210 MD USA
| | - Ilya Zban
- Computer Technologies Laboratory “Computer technology”, ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg, 197101 Russia
| | - Vitaly Aksenov
- Computer Technologies Laboratory “Computer technology”, ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg, 197101 Russia
- IST Austria, Am Campus 1, Klosterneuburg, 3400 Austria
| | - Nikita Alexeev
- Computer Technologies Laboratory “Computer technology”, ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg, 197101 Russia
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, 3400 N. Charles st., Baltimore, 21210 MD USA
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, 11724 NY USA
| |
Collapse
|
15
|
Frith MC, Khan S. A survey of localized sequence rearrangements in human DNA. Nucleic Acids Res 2019; 46:1661-1673. [PMID: 29272440 PMCID: PMC5829575 DOI: 10.1093/nar/gkx1266] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 12/07/2017] [Indexed: 01/29/2023] Open
Abstract
Genomes mutate and evolve in ways simple (substitution or deletion of bases) and complex (e.g. chromosome shattering). We do not fully understand what types of complex mutation occur, and we cannot routinely characterize arbitrarily-complex mutations in a high-throughput, genome-wide manner. Long-read DNA sequencing methods (e.g. PacBio, nanopore) are promising for this task, because one read may encompass a whole complex mutation. We describe an analysis pipeline to characterize arbitrarily-complex 'local' mutations, i.e. intrachromosomal mutations encompassed by one DNA read. We apply it to nanopore and PacBio reads from one human cell line (NA12878), and survey sequence rearrangements, both real and artifactual. Almost all the real rearrangements belong to recurring patterns or motifs: the most common is tandem multiplication (e.g. heptuplication), but there are also complex patterns such as localized shattering, which resembles DNA damage by radiation. Gene conversions are identified, including one between hemoglobin gamma genes. This study demonstrates a way to find intricate rearrangements with any number of duplications, deletions, and repositionings. It demonstrates a probability-based method to resolve ambiguous rearrangements involving highly similar sequences, as occurs in gene conversion. We present a catalog of local rearrangements in one human cell line, and show which rearrangement patterns occur.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| | - Sofia Khan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| |
Collapse
|
16
|
Rajaby R, Sung WK. TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data. Nucleic Acids Res 2019; 46:e122. [PMID: 30137425 PMCID: PMC6237741 DOI: 10.1093/nar/gky685] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 07/19/2018] [Indexed: 01/21/2023] Open
Abstract
Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.
Collapse
Affiliation(s)
- Ramesh Rajaby
- School of Computing, National University of Singapore, 13 Computing Drive, 117417, Singapore.,NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 Medical Drive, 117456, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 13 Computing Drive, 117417, Singapore.,Genome Institute of Singapore, 60 Biopolis Street, Genome, 138672, Singapore
| |
Collapse
|
17
|
Zhu S, Emrich SJ, Chen DZ. Predicting Local Inversions Using Rectangle Clustering and Representative Rectangle Prediction. IEEE Trans Nanobioscience 2019; 18:316-323. [PMID: 31180865 PMCID: PMC6606370 DOI: 10.1109/tnb.2019.2915060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
As a specific type of structural variation, inversions are enjoying particular traction as a result of their established role in evolution. Using third-generation sequencing technology to predict inversions is growing in interest, but many such methods focus on improving sensitivity, giving rise to either too many false positives or very long running times. In this paper, we propose a new framework for inversion detection based on a combination of two novel theoretical models: rectangle clustering and representative rectangle prediction. This combination can automatically filter out false positive inversion predictions while retaining correct ones, leading to a method that has both high sensitivity and high positive prediction values (PPV). Further, this new framework can run very fast on available data. Our software can be freely obtained at https://github.com/UTbioinf/RigInv.
Collapse
|
18
|
Characterization and evolutionary dynamics of complex regions in eukaryotic genomes. SCIENCE CHINA-LIFE SCIENCES 2019; 62:467-488. [PMID: 30810961 DOI: 10.1007/s11427-018-9458-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 01/07/2023]
Abstract
Complex regions in eukaryotic genomes are typically characterized by duplications of chromosomal stretches that often include one or more genes repeated in a tandem array or in relatively close proximity. Nevertheless, the repetitive nature of these regions, together with the often high sequence identity among repeats, have made complex regions particularly recalcitrant to proper molecular characterization, often being misassembled or completely absent in genome assemblies. This limitation has prevented accurate functional and evolutionary analyses of these regions. This is becoming increasingly relevant as evidence continues to support a central role for complex genomic regions in explaining human disease, developmental innovations, and ecological adaptations across phyla. With the advent of long-read sequencing technologies and suitable assemblers, the development of algorithms that can accommodate sample heterozygosity, and the adoption of a pangenomic-like view of these regions, accurate reconstructions of complex regions are now within reach. These reconstructions will finally allow for accurate functional and evolutionary studies of complex genomic regions, underlying the generation of genotype-phenotype maps of unprecedented resolution.
Collapse
|
19
|
Abstract
Affordable, high-throughput DNA sequencing has accelerated the pace of genome assembly over the past decade. Genome assemblies from high-throughput, short-read sequencing, however, are often not as contiguous as the first generation of genome assemblies. Whereas early genome assembly projects were often aided by clone maps or other mapping data, many current assembly projects forego these scaffolding data and only assemble genomes into smaller segments. Recently, new technologies have been invented that allow chromosome-scale assembly at a lower cost and faster speed than traditional methods. Here, we give an overview of the problem of chromosome-scale assembly and traditional methods for tackling this problem. We then review new technologies for chromosome-scale assembly and recent genome projects that used these technologies to create highly contiguous genome assemblies at low cost.
Collapse
Affiliation(s)
- Edward S. Rice
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA;,
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA;,
- Dovetail Genomics, LLC, Santa Cruz, California 95060, USA
| |
Collapse
|
20
|
Wee Y, Bhyan SB, Liu Y, Lu J, Li X, Zhao M. The bioinformatics tools for the genome assembly and analysis based on third-generation sequencing. Brief Funct Genomics 2018; 18:1-12. [DOI: 10.1093/bfgp/ely037] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 10/03/2018] [Accepted: 10/19/2018] [Indexed: 02/06/2023] Open
Affiliation(s)
- YongKiat Wee
- School of Science and Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, Australia
| | - Salma Begum Bhyan
- School of Science and Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, Australia
| | - Yining Liu
- The School of Public Health, Institute for Chemical Carcinogenesis,Guangzhou Medical University, Dongfengxi Road, Guangzhou, China
| | - Jiachun Lu
- The School of Public Health, Institute for Chemical Carcinogenesis,Guangzhou Medical University, Dongfengxi Road, Guangzhou, China
- The School of Public Health, The First Affiliated Hospital, Guangzhou Medical University, Guangzhou, China
| | - Xiaoyan Li
- Beijing Anzhen Hospital, Capital Medical University, Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
| | - Min Zhao
- School of Science and Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, Australia
| |
Collapse
|
21
|
Hehir-Kwa JY, Tops BBJ, Kemmeren P. The clinical implementation of copy number detection in the age of next-generation sequencing. Expert Rev Mol Diagn 2018; 18:907-915. [PMID: 30221560 DOI: 10.1080/14737159.2018.1523723] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
INTRODUCTION The role of copy number variants (CNVs) in disease is now well established. In parallel NGS technologies, such as long-read technologies, there is continual development and data analysis methods continue to be refined. Clinical exome sequencing data is now a reality for many diagnostic laboratories in both congenital genetics and oncology. This provides the ability to detect and report both SNVs and structural variants, including CNVs, using a single assay for a wide range of patient cohorts. Areas covered: Currently, whole-genome sequencing is mainly restricted to research applications and clinical utility studies. Furthermore, detecting the full-size spectrum of CNVs as well as somatic events remains difficult for both exome and whole-genome sequencing. As a result, the full extent of genomic variants in an individual's genome is still largely unknown. Recently, new sequencing technologies have been introduced which maintain the long-range genomic context, aiding the detection of CNVs and structural variants. Expert commentary: The development of long-read sequencing promises to resolve many CNV and SV detection issues but is yet to become established. The current challenge for clinical CNV detection is how to fully exploit all the data which is generated by high throughput sequencing technologies.
Collapse
Affiliation(s)
- Jayne Y Hehir-Kwa
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| | - Bastiaan B J Tops
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| | - Patrick Kemmeren
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| |
Collapse
|
22
|
Liao X, Zhao Y, Kong X, Khan A, Zhou B, Liu D, Kashif MH, Chen P, Wang H, Zhou R. Complete sequence of kenaf (Hibiscus cannabinus) mitochondrial genome and comparative analysis with the mitochondrial genomes of other plants. Sci Rep 2018; 8:12714. [PMID: 30143661 PMCID: PMC6109132 DOI: 10.1038/s41598-018-30297-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 07/27/2018] [Indexed: 01/01/2023] Open
Abstract
Plant mitochondrial (mt) genomes are species specific due to the vast of foreign DNA migration and frequent recombination of repeated sequences. Sequencing of the mt genome of kenaf (Hibiscus cannabinus) is essential for elucidating its evolutionary characteristics. In the present study, single-molecule real-time sequencing technology (SMRT) was used to sequence the complete mt genome of kenaf. Results showed that the complete kenaf mt genome was 569,915 bp long and consisted of 62 genes, including 36 protein-coding, 3 rRNA and 23 tRNA genes. Twenty-five introns were found among nine of the 36 protein-coding genes, and five introns were trans-spliced. A comparative analysis with other plant mt genomes showed that four syntenic gene clusters were conserved in all plant mtDNAs. Fifteen chloroplast-derived fragments were strongly associated with mt genes, including the intact sequences of the chloroplast genes psaA, ndhB and rps7. According to the plant mt genome evolution analysis, some ribosomal protein genes and succinate dehydrogenase genes were frequently lost during the evolution of angiosperms. Our data suggest that the kenaf mt genome retained evolutionarily conserved characteristics. Overall, the complete sequencing of the kenaf mt genome provides additional information and enhances our better understanding of mt genomic evolution across angiosperms.
Collapse
Affiliation(s)
- Xiaofang Liao
- College of Life Sciences and Technology, Guangxi University, Nanning, 530005, China
- Key Laboratory of Plant Genetic and Breeding, College of Agriculture, Guangxi University, Nanning, 530005, China
- Cash Crop Institute of Guangxi Academy of Agricultural Sciences, Nanning, 530007, China
| | - Yanhong Zhao
- Cash Crop Institute of Guangxi Academy of Agricultural Sciences, Nanning, 530007, China
| | - Xiangjun Kong
- Key Laboratory of Plant Genetic and Breeding, College of Agriculture, Guangxi University, Nanning, 530005, China
| | - Aziz Khan
- Key Laboratory of Plant Genetic and Breeding, College of Agriculture, Guangxi University, Nanning, 530005, China
| | - Bujin Zhou
- Key Laboratory of Plant Genetic and Breeding, College of Agriculture, Guangxi University, Nanning, 530005, China
| | - Dongmei Liu
- Key Laboratory of Plant-Microbe Interactions, Department of Life Science and Food, Shangqiu Normal University, Shangqiu, 476000, China
| | - Muhammad Haneef Kashif
- Key Laboratory of Plant Genetic and Breeding, College of Agriculture, Guangxi University, Nanning, 530005, China
| | - Peng Chen
- Key Laboratory of Plant Genetic and Breeding, College of Agriculture, Guangxi University, Nanning, 530005, China
| | - Hong Wang
- Department of Biochemistry, University of Saskatchewan, Saskatoon, SK, S7N5E5, Canada
| | - Ruiyang Zhou
- Key Laboratory of Plant Genetic and Breeding, College of Agriculture, Guangxi University, Nanning, 530005, China.
| |
Collapse
|
23
|
Wala JA, Bandopadhayay P, Greenwald NF, O'Rourke R, Sharpe T, Stewart C, Schumacher S, Li Y, Weischenfeldt J, Yao X, Nusbaum C, Campbell P, Getz G, Meyerson M, Zhang CZ, Imielinski M, Beroukhim R. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res 2018. [PMID: 29535149 PMCID: PMC5880247 DOI: 10.1101/gr.221028.117] [Citation(s) in RCA: 274] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20–300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50–300 bp) SVs.
Collapse
Affiliation(s)
- Jeremiah A Wala
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Pratiti Bandopadhayay
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | - Noah F Greenwald
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | - Ryan O'Rourke
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | - Ted Sharpe
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - Chip Stewart
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - Steve Schumacher
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | - Yilong Li
- Seven Bridges Genomics, Cambridge, Massachusetts 02142, USA.,Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| | - Joachim Weischenfeldt
- The Finsen Laboratory, Rigshospitalet, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Xiaotong Yao
- Tri-Institutional PhD Program in Computational Biology and Medicine, New York, New York 10065, USA.,New York Genome Center, New York, New York 10013, USA
| | - Chad Nusbaum
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - Peter Campbell
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge CB2 2XY, United Kingdom
| | - Gad Getz
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA.,Department of Pathology and Cancer Center, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Matthew Meyerson
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Cheng-Zhong Zhang
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Marcin Imielinski
- New York Genome Center, New York, New York 10013, USA.,Department of Pathology and Laboratory Medicine, Englander Institute for Precision Medicine, Institute for Computational Biomedicine, and Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
| | - Rameen Beroukhim
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
24
|
Elyanow R, Wu HT, Raphael BJ. Identifying structural variants using linked-read sequencing data. Bioinformatics 2017; 34:353-360. [PMID: 29112732 DOI: 10.1093/bioinformatics/btx712] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Revised: 10/24/2017] [Accepted: 11/02/2017] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Structural variation, including large deletions, duplications, inversions, translocations and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (∼5 to 10) DNA molecules ∼50 Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants. RESULTS We present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in an individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification-including two recent methods that also analyze linked-reads-on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes. AVAILABILITY AND IMPLEMENTATION Software is available at compbio.cs.brown.edu/software. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rebecca Elyanow
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - Hsin-Ta Wu
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| |
Collapse
|
25
|
Sedlazeck FJ, Dhroso A, Bodian DL, Paschall J, Hermes F, Zook JM. Tools for annotation and comparison of structural variation. F1000Res 2017; 6:1795. [PMID: 29123647 PMCID: PMC5668921 DOI: 10.12688/f1000research.12516.1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/02/2017] [Indexed: 11/20/2022] Open
Abstract
The impact of structural variants (SVs) on a variety of organisms and diseases like cancer has become increasingly evident. Methods for SV detection when studying genomic differences across cells, individuals or populations are being actively developed. Currently, just a few methods are available to compare different SVs callsets, and no specialized methods are available to annotate SVs that account for the unique characteristics of these variant types. Here, we introduce SURVIVOR_ant, a tool that compares types and breakpoints for candidate SVs from different callsets and enables fast comparison of SVs to genomic features such as genes and repetitive regions, as well as to previously established SV datasets such as from the 1000 Genomes Project. As proof of concept we compared 16 SV callsets generated by different SV calling methods on a single genome, the Genome in a Bottle sample HG002 (Ashkenazi son), and annotated the SVs with gene annotations, 1000 Genomes Project SV calls, and four different types of repetitive regions. Computation time to annotate 134,528 SVs with 33,954 of annotations was 22 seconds on a laptop.
Collapse
Affiliation(s)
- Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Andi Dhroso
- Worcester Polytechnic Institute, Worcester, MA, USA
| | - Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System, Falls Church, VA, USA
| | | | | | - Justin M Zook
- Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
26
|
Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 2017; 17:333-51. [PMID: 27184599 PMCID: PMC10373632 DOI: 10.1038/nrg.2016.49] [Citation(s) in RCA: 2404] [Impact Index Per Article: 300.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.
Collapse
Affiliation(s)
- Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - John D McPherson
- Department of Biochemistry and Molecular Medicine; and the Comprehensive Cancer Center, University of California, Davis, California 95817, USA
| | | |
Collapse
|
27
|
Yang Y, Ye Q, Li K, Li Z, Bo X, Li Z, Xu Y, Wang S, Wang P, Chen H, Wang J. Genomics and Comparative Genomic Analyses Provide Insight into the Taxonomy and Pathogenic Potential of Novel Emmonsia Pathogens. Front Cell Infect Microbiol 2017; 7:105. [PMID: 28409126 PMCID: PMC5374152 DOI: 10.3389/fcimb.2017.00105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 03/16/2017] [Indexed: 12/14/2022] Open
Abstract
Over the last 50 years, newly described species of Emmonsia-like fungi have been implicated globally as sources of systemic human mycosis (emmonsiosis). Their ability to convert into yeast-like cells capable of replication and extra-pulmonary dissemination during the course of infection differentiates them from classical Emmonsia species. Immunocompromised patients are at highest risk of emmonsiosis and exhibit high mortality rates. In order to investigate the molecular basis for pathogenicity of the newly described Emmonsia species, genomic sequencing and comparative genomic analyses of Emmonsia sp. 5z489, which was isolated from a non-deliberately immunosuppressed diabetic patient in China and represents a novel seventh isolate of Emmonsia-like fungi, was performed. The genome size of 5z489 was 35.5 Mbp in length, which is ~5 Mbp larger than other Emmonsia strains. Further, 9,188 protein genes were predicted in the 5z489 genome and 16% of the assembly was identified as repetitive elements, which is the largest abundance in Emmonsia species. Phylogenetic analyses based on whole genome data classified 5z489 and CAC-2015a, another novel isolate, as members of the genus Emmonsia. Our analyses showed that divergences among Emmonsia occurred much earlier than other genera within the family Ajellomycetaceae, suggesting relatively distant evolutionary relationships among the genus. Through comparisons of Emmonsia species, we discovered significant pathogenicity characteristics within the genus as well as putative virulence factors that may play a role in the infection and pathogenicity of the novel Emmonsia strains. Moreover, our analyses revealed a novel distribution mode of DNA methylation patterns across the genome of 5z489, with >50% of methylated bases located in intergenic regions. These methylation patterns differ considerably from other reported fungi, where most methylation occurs in repetitive loci. It is unclear if this difference is related to physiological adaptations of new Emmonsia, but this question warrants further investigation. Overall, our analyses provide a framework from which to further study the evolutionary dynamics of Emmonsia strains and identity the underlying molecular mechanisms that determine the infectious and pathogenic potency of these fungal pathogens, and also provide insight into potential targets for therapeutic intervention of emmonsiosis and further research.
Collapse
Affiliation(s)
- Ying Yang
- Academy of Military Medical SciencesBeijing, China.,Department of Biotechnology, Beijing Institute of Radiation MedicineBeijing, China.,Department of Biological Product Control, National Institutes for Food and Drug ControlBeijing, China
| | - Qiang Ye
- Department of Biological Product Control, National Institutes for Food and Drug ControlBeijing, China.,Key Laboratory of the Ministry of Health for Research on Quality and Standardization of Biotech ProductsBeijing, China
| | - Kang Li
- Department of Biological Product Control, National Institutes for Food and Drug ControlBeijing, China.,Key Laboratory of the Ministry of Health for Research on Quality and Standardization of Biotech ProductsBeijing, China
| | - Zongwei Li
- Center for Hospital Infection Control, Chinese PLA Institute for Disease Control and PreventionBeijing, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation MedicineBeijing, China
| | - Zhen Li
- Department of Biotechnology, Beijing Institute of Radiation MedicineBeijing, China
| | - Yingchun Xu
- Division of Medical Microbiology, Peking Union Medical College HospitalBeijing, China
| | - Shengqi Wang
- Department of Biotechnology, Beijing Institute of Radiation MedicineBeijing, China
| | - Peng Wang
- Division of Medical Microbiology, Peking Union Medical College HospitalBeijing, China
| | - Huipeng Chen
- Academy of Military Medical SciencesBeijing, China
| | - Junzhi Wang
- Department of Biological Product Control, National Institutes for Food and Drug ControlBeijing, China
| |
Collapse
|
28
|
Fan X, Chaisson M, Nakhleh L, Chen K. HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies. Genome Res 2017; 27:793-800. [PMID: 28104618 PMCID: PMC5411774 DOI: 10.1101/gr.214767.116] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 12/19/2016] [Indexed: 12/29/2022]
Abstract
Achieving complete, accurate, and cost-effective assembly of human genomes is of great importance for realizing the promise of precision medicine. The abundance of repeats and genetic variations in human genomes and the limitations of existing sequencing technologies call for the development of novel assembly methods that can leverage the complementary strengths of multiple technologies. We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next-generation sequencing and single-molecule sequencing technologies to accurately assemble and detect structural variants (SVs) in human genomes. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance the assembly of structurally altered regions in human genomes. We used data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878) to test our approach. The result showed that, compared with existing methods, our approach had a low false discovery rate and substantially improved the detection of many types of SVs, particularly novel large insertions, small indels (10–50 bp), and short tandem repeat expansions and contractions. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.
Collapse
Affiliation(s)
- Xian Fan
- Department of Computer Science, Rice University, Houston, Texas 77005, USA.,Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Mark Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| |
Collapse
|
29
|
Vu T, Davidson SL, Borgesi J, Maksudul M, Jeon TJ, Shim J. Piecing together the puzzle: nanopore technology in detection and quantification of cancer biomarkers. RSC Adv 2017. [DOI: 10.1039/c7ra08063h] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
This mini-review paper is a comprehensive outline of nanopore technology applications in the detection and study of various cancer causal factors.
Collapse
Affiliation(s)
- Trang Vu
- Department of Biomedical Engineering
- Henry M. Rowan College of Engineering
- Rowan University
- Glassboro
- USA
| | - Shanna-Leigh Davidson
- Department of Biomedical Engineering
- Henry M. Rowan College of Engineering
- Rowan University
- Glassboro
- USA
| | - Julia Borgesi
- Department of Biomedical Engineering
- Henry M. Rowan College of Engineering
- Rowan University
- Glassboro
- USA
| | - Mowla Maksudul
- Department of Biomedical Engineering
- Henry M. Rowan College of Engineering
- Rowan University
- Glassboro
- USA
| | - Tae-Joon Jeon
- Department of Biological Engineering
- Inha University
- Incheon 22212
- Republic of Korea
| | - Jiwook Shim
- Department of Biomedical Engineering
- Henry M. Rowan College of Engineering
- Rowan University
- Glassboro
- USA
| |
Collapse
|
30
|
Acuna-Hidalgo R, Veltman JA, Hoischen A. New insights into the generation and role of de novo mutations in health and disease. Genome Biol 2016; 17:241. [PMID: 27894357 PMCID: PMC5125044 DOI: 10.1186/s13059-016-1110-1] [Citation(s) in RCA: 297] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Aside from inheriting half of the genome of each of our parents, we are born with a small number of novel mutations that occurred during gametogenesis and postzygotically. Recent genome and exome sequencing studies of parent-offspring trios have provided the first insights into the number and distribution of these de novo mutations in health and disease, pointing to risk factors that increase their number in the offspring. De novo mutations have been shown to be a major cause of severe early-onset genetic disorders such as intellectual disability, autism spectrum disorder, and other developmental diseases. In fact, the occurrence of novel mutations in each generation explains why these reproductively lethal disorders continue to occur in our population. Recent studies have also shown that de novo mutations are predominantly of paternal origin and that their number increases with advanced paternal age. Here, we review the recent literature on de novo mutations, covering their detection, biological characterization, and medical impact.
Collapse
Affiliation(s)
- Rocio Acuna-Hidalgo
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
| | - Joris A Veltman
- Department of Human Genetics, Donders Institute of Neuroscience, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands.
- Department of Clinical Genetics, GROW - School for Oncology and Developmental Biology, Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER, Maastricht, The Netherlands.
| | - Alexander Hoischen
- Department of Human Genetics, Donders Institute of Neuroscience, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
| |
Collapse
|
31
|
Lavezzo E, Barzon L, Toppo S, Palù G. Third generation sequencing technologies applied to diagnostic microbiology: benefits and challenges in applications and data analysis. Expert Rev Mol Diagn 2016; 16:1011-23. [PMID: 27453996 DOI: 10.1080/14737159.2016.1217158] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION The diagnosis of infectious diseases is among the most successful areas of application of new generation sequencing technologies. The field has seen the development of numerous experimental and analytical approaches for the detection and the fine description of pathogenic and non-pathogenic microorganisms. AREAS COVERED Without claiming to be exhaustive with respect to all applications and methods developed over the years, this review focuses on the advantages and the issues brought by the new technologies, with an eye in particular to third generation sequencing methods. Both experimental procedures and algorithmic strategies are presented, following the most relevant publications which have led to progress in our ability of detecting infectious agents. Expert commentary: The technical advance brought by third generation sequencing platforms has the potential to significantly expand the range of diagnostic tools that will be available to clinicians. Nonetheless, the implementation of these technologies in clinical practice is still far from being actionable and will temporally follow the path undertaken by second generation methods, which still require the setup of standardized pipelines in both wet and dry laboratory procedures.
Collapse
Affiliation(s)
- Enrico Lavezzo
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Luisa Barzon
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Stefano Toppo
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Giorgio Palù
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| |
Collapse
|
32
|
Sanders AD, Hills M, Porubský D, Guryev V, Falconer E, Lansdorp PM. Characterizing polymorphic inversions in human genomes by single-cell sequencing. Genome Res 2016; 26:1575-1587. [PMID: 27472961 PMCID: PMC5088599 DOI: 10.1101/gr.201160.115] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 06/13/2016] [Indexed: 12/23/2022]
Abstract
Identifying genomic features that differ between individuals and cells can help uncover the functional variants that drive phenotypes and disease susceptibilities. For this, single-cell studies are paramount, as it becomes increasingly clear that the contribution of rare but functional cellular subpopulations is important for disease prognosis, management, and progression. Until now, studying these associations has been challenged by our inability to map structural rearrangements accurately and comprehensively. To overcome this, we coupled single-cell sequencing of DNA template strands (Strand-seq) with custom analysis software to rapidly discover, map, and genotype genomic rearrangements at high resolution. This allowed us to explore the distribution and frequency of inversions in a heterogeneous cell population, identify several polymorphic domains in complex regions of the genome, and locate rare alleles in the reference assembly. We then mapped the entire genomic complement of inversions within two unrelated individuals to characterize their distinct inversion profiles and built a nonredundant global reference of structural rearrangements in the human genome. The work described here provides a powerful new framework to study structural variation and genomic heterogeneity in single-cell samples, whether from individuals for population studies or tissue types for biomarker discovery.
Collapse
Affiliation(s)
- Ashley D Sanders
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Mark Hills
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - David Porubský
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, NL-9713 AV Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, NL-9713 AV Groningen, The Netherlands
| | - Ester Falconer
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Peter M Lansdorp
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada.,European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, NL-9713 AV Groningen, The Netherlands.,Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| |
Collapse
|
33
|
Dapprich J, Ferriola D, Mackiewicz K, Clark PM, Rappaport E, D’Arcy M, Sasson A, Gai X, Schug J, Kaestner KH, Monos D. The next generation of target capture technologies - large DNA fragment enrichment and sequencing determines regional genomic variation of high complexity. BMC Genomics 2016; 17:486. [PMID: 27393338 PMCID: PMC4938946 DOI: 10.1186/s12864-016-2836-6] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 06/15/2016] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND The ability to capture and sequence large contiguous DNA fragments represents a significant advancement towards the comprehensive characterization of complex genomic regions. While emerging sequencing platforms are capable of producing several kilobases-long reads, the fragment sizes generated by current DNA target enrichment technologies remain a limiting factor, producing DNA fragments generally shorter than 1 kbp. The DNA enrichment methodology described herein, Region-Specific Extraction (RSE), produces DNA segments in excess of 20 kbp in length. Coupling this enrichment method to appropriate sequencing platforms will significantly enhance the ability to generate complete and accurate sequence characterization of any genomic region without the need for reference-based assembly. RESULTS RSE is a long-range DNA target capture methodology that relies on the specific hybridization of short (20-25 base) oligonucleotide primers to selected sequence motifs within the DNA target region. These capture primers are then enzymatically extended on the 3'-end, incorporating biotinylated nucleotides into the DNA. Streptavidin-coated beads are subsequently used to pull-down the original, long DNA template molecules via the newly synthesized, biotinylated DNA that is bound to them. We demonstrate the accuracy, simplicity and utility of the RSE method by capturing and sequencing a 4 Mbp stretch of the major histocompatibility complex (MHC). Our results show an average depth of coverage of 164X for the entire MHC. This depth of coverage contributes significantly to a 99.94 % total coverage of the targeted region and to an accuracy that is over 99.99 %. CONCLUSIONS RSE represents a cost-effective target enrichment method capable of producing sequencing templates in excess of 20 kbp in length. The utility of our method has been proven to generate superior coverage across the MHC as compared to other commercially available methodologies, with the added advantage of producing longer sequencing templates amenable to DNA sequencing on recently developed platforms. Although our demonstration of the method does not utilize these DNA sequencing platforms directly, our results indicate that the capture of long DNA fragments produce superior coverage of the targeted region.
Collapse
Affiliation(s)
| | - Deborah Ferriola
- />Generation Biotech, Lawrenceville, NJ 08648 USA
- />Department of Pathology and Laboratory Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Kate Mackiewicz
- />Department of Pathology and Laboratory Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Peter M. Clark
- />Department of Pathology and Laboratory Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Eric Rappaport
- />Nucleic Acids & Protein Core Facility, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Monica D’Arcy
- />The Center for Biomedical Informatics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Ariella Sasson
- />The Center for Biomedical Informatics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Xiaowu Gai
- />The Center for Biomedical Informatics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Jonathan Schug
- />Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Klaus H. Kaestner
- />Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Dimitri Monos
- />Department of Pathology and Laboratory Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
- />The Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| |
Collapse
|
34
|
Norris AL, Workman RE, Fan Y, Eshleman JR, Timp W. Nanopore sequencing detects structural variants in cancer. Cancer Biol Ther 2016; 17:246-53. [PMID: 26787508 PMCID: PMC4848001 DOI: 10.1080/15384047.2016.1139236] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Revised: 12/08/2015] [Accepted: 01/01/2016] [Indexed: 11/21/2022] Open
Abstract
Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300 bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20 kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring.
Collapse
Affiliation(s)
- Alexis L. Norris
- Departments of Pathology and Oncology, The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Rachael E. Workman
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Yunfan Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - James R. Eshleman
- Departments of Pathology and Oncology, The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
35
|
Zhao F, Bajic VB. The Value and Significance of Metagenomics of Marine Environments. Preface. GENOMICS PROTEOMICS & BIOINFORMATICS 2015; 13:271-4. [PMID: 26607677 PMCID: PMC4678774 DOI: 10.1016/j.gpb.2015.10.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 10/31/2015] [Indexed: 02/02/2023]
Affiliation(s)
- Fangqing Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China.
| | - Vladimir B Bajic
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
| |
Collapse
|
36
|
Rhoads A, Au KF. PacBio Sequencing and Its Applications. GENOMICS PROTEOMICS & BIOINFORMATICS 2015; 13:278-89. [PMID: 26542840 PMCID: PMC4678779 DOI: 10.1016/j.gpb.2015.08.002] [Citation(s) in RCA: 1323] [Impact Index Per Article: 132.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/06/2015] [Accepted: 08/11/2015] [Indexed: 12/15/2022]
Abstract
Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.
Collapse
Affiliation(s)
- Anthony Rhoads
- Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA
| | - Kin Fai Au
- Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA; Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
37
|
Next-Generation Sequencing Approaches in Cancer: Where Have They Brought Us and Where Will They Take Us? Cancers (Basel) 2015; 7:1925-58. [PMID: 26404381 PMCID: PMC4586802 DOI: 10.3390/cancers7030869] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 09/15/2015] [Indexed: 12/20/2022] Open
Abstract
Next-generation sequencing (NGS) technologies and data have revolutionized cancer research and are increasingly being deployed to guide clinicians in treatment decision-making. NGS technologies have allowed us to take an “omics” approach to cancer in order to reveal genomic, transcriptomic, and epigenomic landscapes of individual malignancies. Integrative multi-platform analyses are increasingly used in large-scale projects that aim to fully characterize individual tumours as well as general cancer types and subtypes. In this review, we examine how NGS technologies in particular have contributed to “omics” approaches in cancer research, allowing for large-scale integrative analyses that consider hundreds of tumour samples. These types of studies have provided us with an unprecedented wealth of information, providing the background knowledge needed to make small-scale (including “N of 1”) studies informative and relevant. We also take a look at emerging opportunities provided by NGS and state-of-the-art third-generation sequencing technologies, particularly in the context of translational research. Cancer research and care are currently poised to experience significant progress catalyzed by accessible sequencing technologies that will benefit both clinical- and research-based efforts.
Collapse
|
38
|
Wang Q, Lu Q, Zhao H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front Genet 2015; 6:149. [PMID: 25941534 PMCID: PMC4403555 DOI: 10.3389/fgene.2015.00149] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2015] [Accepted: 03/30/2015] [Indexed: 12/22/2022] Open
Abstract
Results from numerous linkage and association studies have greatly deepened scientists’ understanding of the genetic basis of many human diseases, yet some important questions remain unanswered. For example, although a large number of disease-associated loci have been identified from genome-wide association studies in the past 10 years, it is challenging to interpret these results as most disease-associated markers have no clear functional roles in disease etiology, and all the identified genomic factors only explain a small portion of disease heritability. With the help of next-generation sequencing (NGS), diverse types of genomic and epigenetic variations can be detected with high accuracy. More importantly, instead of using linkage disequilibrium to detect association signals based on a set of pre-set probes, NGS allows researchers to directly study all the variants in each individual, therefore promises opportunities for identifying functional variants and a more comprehensive dissection of disease heritability. Although the current scale of NGS studies is still limited due to the high cost, the success of several recent studies suggests the great potential for applying NGS in genomic epidemiology, especially as the cost of sequencing continues to drop. In this review, we discuss several pioneer applications of NGS, summarize scientific discoveries for rare and complex diseases, and compare various study designs including targeted sequencing and whole-genome sequencing using population-based and family-based cohorts. Finally, we highlight recent advancements in statistical methods proposed for sequencing analysis, including group-based association tests, meta-analysis techniques, and annotation tools for variant prioritization.
Collapse
Affiliation(s)
- Qian Wang
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA
| | - Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA ; Department of Biostatistics, Yale School of Public Health New Haven, CT, USA ; Veterans Affairs Cooperative Studies Program Coordinating Center West Haven, CT, USA
| |
Collapse
|
39
|
Parks MM, Lawrence CE, Raphael BJ. Detecting non-allelic homologous recombination from high-throughput sequencing data. Genome Biol 2015; 16:72. [PMID: 25886137 PMCID: PMC4425883 DOI: 10.1186/s13059-015-0633-1] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 03/16/2015] [Indexed: 12/27/2022] Open
Abstract
Non-allelic homologous recombination (NAHR) is a common mechanism for generating genome rearrangements and is implicated in numerous genetic disorders, but its detection in high-throughput sequencing data poses a serious challenge. We present a probabilistic model of NAHR and demonstrate its ability to find NAHR in low-coverage sequencing data from 44 individuals. We identify NAHR-mediated deletions or duplications in 109 of 324 potential NAHR loci in at least one of the individuals. These calls segregate by ancestry, are more common in closely spaced repeats, often result in duplicated genes or pseudogenes, and affect highly studied genes such as GBA and CYP2E1.
Collapse
Affiliation(s)
- Matthew M Parks
- Division of Applied Mathematics, Brown University, Providence, USA.
| | - Charles E Lawrence
- Division of Applied Mathematics, Brown University, Providence, USA. .,Center for Computational Molecular Biology, Brown University, Providence, USA.
| | - Benjamin J Raphael
- Center for Computational Molecular Biology, Brown University, Providence, USA. .,Department of Computer Science, Brown University, Providence, USA.
| |
Collapse
|