1
|
Celus CS, Ahmad SF, Gangwar M, Kumar S, Kumar A. Deciphering new insights into copy number variations as drivers of genomic diversity and adaptation in farm animal species. Gene 2025; 939:149159. [PMID: 39672215 DOI: 10.1016/j.gene.2024.149159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2024] [Revised: 11/15/2024] [Accepted: 12/09/2024] [Indexed: 12/15/2024]
Abstract
The basis of all improvement in (re)production performance of animals and plants lies in the genetic variation. The underlying genetic variation can be further explored through investigations using molecular markers including single nucleotide polymorphism (SNP) and microsatellite, and more recently structural variants like copy number variations (CNVs). Unlike SNPs, CNVs affect a larger proportion of the genome, making them more impactful vis-à-vis variation at the phenotype level. They significantly contribute to genetic variation and provide raw material for natural and artificial selection for improved performance. CNVs are characterized as unbalanced structural variations that arise from four major mechanisms viz., non-homologous end joining (NHEJ), non-allelic homologous recombination (NAHR), fork stalling and template switching (FoSTeS), and retrotransposition. Various detection methods have been developed to identify CNVs, including molecular techniques and massively parallel sequencing. Next-generation sequencing (NGS)/high-throughput sequencing offers higher resolution and sensitivity, but challenges remain in delineating CNVs in regions with repetitive sequences or high GC content. High-throughput sequencing technologies utilize different methods based on read-pair, split-read, read depth, and assembly approaches (or their combination) to detect CNVs. Read-pair based methods work by mapping discordant reads, while the read-depth approach works on detecting the correlation between read depth and copy number of genetic segments or a gene. Split-read methods involve mapping segments of reads to different locations on the genome, while assembly methods involve comparing contigs to a reference or de novo sequencing. Similar to other marker-trait association studies, CNV-association studies are not uncommon in humans and farm animals. Soon, extensive studies will be needed to deduce the unique evolutionary trajectories and underlying molecular mechanisms for targeted genetic improvements in different farm animal species. The present review delineates the importance of CNVs in genetic studies, their generation along with programs and principles to efficiently identify them, and finally throw light on the existing literature on studies in farm animal species vis-à-vis CNVs.
Collapse
Affiliation(s)
- C S Celus
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh 243122, India
| | - Sheikh Firdous Ahmad
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh 243122, India; Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh 243122, India.
| | - Munish Gangwar
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh 243122, India
| | - Subodh Kumar
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh 243122, India
| | - Amit Kumar
- Division of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh 243122, India
| |
Collapse
|
2
|
Louw N, Carstens N, Lombard Z, for DDD-Africa as members of the H3Africa Consortium. Incorporating CNV analysis improves the yield of exome sequencing for rare monogenic disorders-an important consideration for resource-constrained settings. Front Genet 2023; 14:1277784. [PMID: 38155715 PMCID: PMC10753787 DOI: 10.3389/fgene.2023.1277784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 11/22/2023] [Indexed: 12/30/2023] Open
Abstract
Exome sequencing (ES) is a recommended first-tier diagnostic test for many rare monogenic diseases. It allows for the detection of both single-nucleotide variants (SNVs) and copy number variants (CNVs) in coding exonic regions of the genome in a single test, and this dual analysis is a valuable approach, especially in limited resource settings. Single-nucleotide variants are well studied; however, the incorporation of copy number variant analysis tools into variant calling pipelines has not been implemented yet as a routine diagnostic test, and chromosomal microarray is still more widely used to detect copy number variants. Research shows that combined single and copy number variant analysis can lead to a diagnostic yield of up to 58%, increasing the yield with as much as 18% from the single-nucleotide variant only pipeline. Importantly, this is achieved with the consideration of computational costs only, without incurring any additional sequencing costs. This mini review provides an overview of copy number variant analysis from exome data and what the current recommendations are for this type of analysis. We also present an overview on rare monogenic disease research standard practices in resource-limited settings. We present evidence that integrating copy number variant detection tools into a standard exome sequencing analysis pipeline improves diagnostic yield and should be considered a significantly beneficial addition, with relatively low-cost implications. Routine implementation in underrepresented populations and limited resource settings will promote generation and sharing of CNV datasets and provide momentum to build core centers for this niche within genomic medicine.
Collapse
Affiliation(s)
- Nadja Louw
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Nadia Carstens
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Genomics Platform, South African Medical Research Council, Cape Town, South Africa
| | - Zané Lombard
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | | |
Collapse
|
3
|
Kosugi S, Kamatani Y, Harada K, Tomizuka K, Momozawa Y, Morisaki T, The BioBank Japan Project, Terao C. Detection of trait-associated structural variations using short-read sequencing. CELL GENOMICS 2023; 3:100328. [PMID: 37388916 PMCID: PMC10300613 DOI: 10.1016/j.xgen.2023.100328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 02/17/2023] [Accepted: 04/25/2023] [Indexed: 07/01/2023]
Abstract
Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba 277-8562, Japan
| | - Katsutoshi Harada
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan
| | - Takayuki Morisaki
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
| | | | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
4
|
Yuan Y, Bayer PE, Batley J, Edwards D. Current status of structural variation studies in plants. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:2153-2163. [PMID: 34101329 PMCID: PMC8541774 DOI: 10.1111/pbi.13646] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 05/23/2023]
Abstract
Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
- School of Life Sciences and State Key Laboratory for AgrobiotechnologyThe Chinese University of Hong KongHong Kong SARChina
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| |
Collapse
|
5
|
Next Generation Sequencing Technology in the Clinic and Its Challenges. Cancers (Basel) 2021; 13:cancers13081751. [PMID: 33916923 PMCID: PMC8067551 DOI: 10.3390/cancers13081751] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 03/30/2021] [Accepted: 04/05/2021] [Indexed: 12/12/2022] Open
Abstract
Simple Summary Precise identification and annotation of mutations are of utmost importance in clinical oncology. Insights of the DNA sequence can provide meaningful knowledge to unravel the underlying genetics of disease. Hence, tailoring of personalized medicine often relies on specific genomic alteration for treatment efficacy. The aim of this review is to highlight that sequencing harbors much more than just four nucleotides. Moreover, the gradual transition from first to second generation sequencing technologies has led to awareness for choosing the most appropriate bioinformatic analytic tools based on the aim, quality and demand for a specific purpose. Thus, the same raw data can lead to various results reflecting the intrinsic features of different datamining pipelines. Abstract Data analysis has become a crucial aspect in clinical oncology to interpret output from next-generation sequencing-based testing. NGS being able to resolve billions of sequencing reactions in a few days has consequently increased the demand for tools to handle and analyze such large data sets. Many tools have been developed since the advent of NGS, featuring their own peculiarities. Increased awareness when interpreting alterations in the genome is therefore of utmost importance, as the same data using different tools can provide diverse outcomes. Hence, it is crucial to evaluate and validate bioinformatic pipelines in clinical settings. Moreover, personalized medicine implies treatment targeting efficacy of biological drugs for specific genomic alterations. Here, we focused on different sequencing technologies, features underlying the genome complexity, and bioinformatic tools that can impact the final annotation. Additionally, we discuss the clinical demand and design for implementing NGS.
Collapse
|
6
|
Viailly PJ, Sater V, Viennot M, Bohers E, Vergne N, Berard C, Dauchel H, Lecroq T, Celebi A, Ruminy P, Marchand V, Lanic MD, Dubois S, Penther D, Tilly H, Mareschal S, Jardin F. Improving high-resolution copy number variation analysis from next generation sequencing using unique molecular identifiers. BMC Bioinformatics 2021; 22:120. [PMID: 33711922 PMCID: PMC7971104 DOI: 10.1186/s12859-021-04060-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 03/02/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Recently, copy number variations (CNV) impacting genes involved in oncogenic pathways have attracted an increasing attention to manage disease susceptibility. CNV is one of the most important somatic aberrations in the genome of tumor cells. Oncogene activation and tumor suppressor gene inactivation are often attributed to copy number gain/amplification or deletion, respectively, in many cancer types and stages. Recent advances in next generation sequencing protocols allow for the addition of unique molecular identifiers (UMI) to each read. Each targeted DNA fragment is labeled with a unique random nucleotide sequence added to sequencing primers. UMI are especially useful for CNV detection by making each DNA molecule in a population of reads distinct. RESULTS Here, we present molecular Copy Number Alteration (mCNA), a new methodology allowing the detection of copy number changes using UMI. The algorithm is composed of four main steps: the construction of UMI count matrices, the use of control samples to construct a pseudo-reference, the computation of log-ratios, the segmentation and finally the statistical inference of abnormal segmented breaks. We demonstrate the success of mCNA on a dataset of patients suffering from Diffuse Large B-cell Lymphoma and we highlight that mCNA results have a strong correlation with comparative genomic hybridization. CONCLUSION We provide mCNA, a new approach for CNV detection, freely available at https://gitlab.com/pierrejulien.viailly/mcna/ under MIT license. mCNA can significantly improve detection accuracy of CNV changes by using UMI.
Collapse
Affiliation(s)
- Pierre-Julien Viailly
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France. .,Centre Henri Becquerel, Rouen, France.
| | - Vincent Sater
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France.,LITIS EA 4108, Normandie Univ, UNIROUEN, Rouen, France
| | - Mathieu Viennot
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| | - Elodie Bohers
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| | | | | | | | | | - Alison Celebi
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France.,Master Bioinformatique BIM, Normandie Univ, UNIROUEN, Rouen, France
| | - Philippe Ruminy
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| | - Vinciane Marchand
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| | - Marie-Delphine Lanic
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| | - Sydney Dubois
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| | - Dominique Penther
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| | - Hervé Tilly
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| | - Sylvain Mareschal
- INSERM U1052 UMR CNRS 5286, Cancer Research Center of Lyon, Lyon, France
| | - Fabrice Jardin
- INSERM U1245, Team Genomics and Biomarkers of Lymphoma and Solid Tumors, Normandie Univ, UNIROUEN, Rouen, France.,Centre Henri Becquerel, Rouen, France
| |
Collapse
|
7
|
Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biol 2019; 20:246. [PMID: 31747936 PMCID: PMC6868818 DOI: 10.1186/s13059-019-1828-7] [Citation(s) in RCA: 378] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 09/19/2019] [Indexed: 02/08/2023] Open
Abstract
Recent research into structural variants (SVs) has established their importance to medicine and molecular biology, elucidating their role in various diseases, regulation of gene expression, ethnic diversity, and large-scale chromosome evolution-giving rise to the differences within populations and among species. Nevertheless, characterizing SVs and determining the optimal approach for a given experimental design remains a computational and scientific challenge. Multiple approaches have emerged to target various SV classes, zygosities, and size ranges. Here, we review these approaches with respect to their ability to infer SVs across the full spectrum of large, complex variations and present computational methods for each approach.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, USA
| | - Nastassia Gobet
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Diana Ivette Cruz-Dávalos
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Ninon Mounier
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| | - Christophe Dessimoz
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London, UK.
- Department of Computer Science, University College London, London, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, USA.
| |
Collapse
|
8
|
Roca I, González-Castro L, Fernández H, Couce ML, Fernández-Marmiesse A. Free-access copy-number variant detection tools for targeted next-generation sequencing data. MUTATION RESEARCH-REVIEWS IN MUTATION RESEARCH 2019; 779:114-125. [DOI: 10.1016/j.mrrev.2019.02.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 12/25/2018] [Accepted: 02/22/2019] [Indexed: 01/23/2023]
|
9
|
Dolatabadian A, Patel DA, Edwards D, Batley J. Copy number variation and disease resistance in plants. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017; 130:2479-2490. [PMID: 29043379 DOI: 10.1007/s00122-017-2993-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Accepted: 09/27/2017] [Indexed: 05/06/2023]
Abstract
Plant genome diversity varies from single nucleotide polymorphisms to large-scale deletions, insertions, duplications, or re-arrangements. These re-arrangements of sequences resulting from duplication, gains or losses of DNA segments are termed copy number variations (CNVs). During the last decade, numerous studies have emphasized the importance of CNVs as a factor affecting human phenotype; in particular, CNVs have been associated with risks for several severe diseases. In plants, the exploration of the extent and role of CNVs in resistance against pathogens and pests is just beginning. Since CNVs are likely to be associated with disease resistance in plants, an understanding of the distribution of CNVs could assist in the identification of novel plant disease-resistance genes. In this paper, we review existing information about CNVs; their importance, role and function, as well as their association with disease resistance in plants.
Collapse
Affiliation(s)
- Aria Dolatabadian
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Crawley, WA, 6009, Australia
| | - Dhwani Apurva Patel
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Crawley, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Crawley, WA, 6009, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Crawley, WA, 6009, Australia.
| |
Collapse
|
10
|
Sahlin K, Frånberg M, Arvestad L. Structural Variation Detection with Read Pair Information: An Improved Null Hypothesis Reduces Bias. J Comput Biol 2016; 24:581-589. [PMID: 27681236 DOI: 10.1089/cmb.2016.0124] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Reads from paired-end and mate-pair libraries are often utilized to find structural variation in genomes, and one common approach is to use their fragment length for detection. After aligning read pairs to the reference, read pair distances are analyzed for statistically significant deviations. However, previously proposed methods are based on a simplified model of observed fragment lengths that does not agree with data. We show how this model limits statistical analysis of identifying variants and propose a new model by adapting a model we have previously introduced for contig scaffolding, which agrees with data. From this model, we derive an improved null hypothesis that when applied in the variant caller CLEVER, reduces the number of false positives and corrects a bias that contributes to more deletion calls than insertion calls. We advise developers of variant callers with statistical fragment length-based methods to adapt the concepts in our proposed model and null hypothesis.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- 1 Science for Life Laboratory, School of Computer Science and Communication, KTH Royal Institute of Technology , Stockholm, Sweden
| | - Mattias Frånberg
- 2 Cardiovascular Unit, Department of Medicine, Karolinska Institutet , Stockholm, Sweden
| | - Lars Arvestad
- 3 Swedish e-Science Research Centre (SeRC) and Department of Numerical Analysis and Computer Science, Stockholm University , Solna, Sweden
| |
Collapse
|
11
|
Guan P, Sung WK. Structural variation detection using next-generation sequencing data: A comparative technical review. Methods 2016; 102:36-49. [PMID: 26845461 DOI: 10.1016/j.ymeth.2016.01.020] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2015] [Revised: 01/09/2016] [Accepted: 01/31/2016] [Indexed: 12/11/2022] Open
Abstract
Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies.
Collapse
Affiliation(s)
- Peiyong Guan
- School of Computing, National University of Singapore, 117543, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 117543, Singapore; Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore.
| |
Collapse
|
12
|
Bartenhagen C, Dugas M. Robust and exact structural variation detection with paired-end and soft-clipped alignments: SoftSV compared with eight algorithms. Brief Bioinform 2015; 17:51-62. [PMID: 25998133 DOI: 10.1093/bib/bbv028] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Indexed: 11/14/2022] Open
Abstract
Structural variation (SV) plays an important role in genetic diversity among the population in general and specifically in diseases such as cancer. Modern next-generation sequencing (NGS) technologies provide paired-end sequencing data at high depth with increasing read lengths. This development enabled the analysis of split-reads to detect SV breakpoints with single-nucleotide resolution. But ambiguous mappings and breakpoint sequences with further co-occurring mutations hamper split-read alignments against a reference sequence. The trade-off between high sensitivity and low false-positive rate is problematic and often requires a lot of fine-tuning of the analysis method based on knowledge about its algorithm and the characteristics of the data set. We present SoftSV, a method for exact breakpoint detection for small and large deletions, inversions, tandem duplications and inter-chromosomal translocations, which relies solely on the mutual alignment of soft-clipped reads within the neighborhood of discordantly mapped paired-end reads. Unlike other SV detection algorithms, our approach does not require thresholds regarding sequencing coverage or mapping quality. We evaluate SoftSV together with eight approaches (Breakdancer, Clever, CREST, Delly, GASVPro, Pindel, Socrates and SoftSearch) on simulated and real data sets. Our results show that sensitive and reliable SV detection is subject to many different factors like read length, sequence coverage and SV type. While most programs have their individual drawbacks, our greedy approach turns out to be the most robust and sensitive on many experimental setups. Sensitivities above 85% and positive predictive values between 80 and 100% could be achieved consistently for all SV types on simulated data sets starting at relatively short 75 bp reads and low 10-15× sequence coverage.
Collapse
|
13
|
Pirooznia M, Goes FS, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet 2015; 6:138. [PMID: 25918519 PMCID: PMC4394692 DOI: 10.3389/fgene.2015.00138] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 03/23/2015] [Indexed: 01/04/2023] Open
Abstract
Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development.
Collapse
Affiliation(s)
- Mehdi Pirooznia
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Fernando S Goes
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Peter P Zandi
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA ; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health Baltimore, MD, USA USA
| |
Collapse
|