1
|
Liu X, Liu S, Wang H, Hu T. Potentials and challenges of chromosomal microarray analysis in prenatal diagnosis. Front Genet 2022; 13:938183. [PMID: 35957681 PMCID: PMC9360565 DOI: 10.3389/fgene.2022.938183] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 07/11/2022] [Indexed: 11/19/2022] Open
Abstract
Introduction: For decades, conventional karyotyping analysis has been the gold standard for detecting chromosomal abnormalities during prenatal diagnosis. With the development of molecular cytogenetic methods, this situation has dramatically changed. Chromosomal microarray analysis (CMA), a method of genome-wide detection with high resolution, has been recommended as a first-tier test for prenatal diagnosis, especially for fetuses with structural abnormalities. Methods: Based on the primary literature, this review provides an updated summary of the application of CMA for prenatal diagnosis. In addition, this review addresses the challenges that CMA faces with the emergence of genome sequencing techniques, such as copy number variation sequencing, genome-wide cell-free DNA testing, and whole exome sequencing. Conclusion: The CMA platform is still suggested as priority testing methodology in the prenatal setting currently. However, pregnant women may benefit from genome sequencing, which enables the simultaneous detection of copy number variations, regions of homozygosity and single-nucleotide variations, in near future.
Collapse
Affiliation(s)
- Xijing Liu
- Department of Medical Genetics, West China Second University Hospital, Sichuan University, Chengdu, China
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, China
- Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
| | - Shanling Liu
- Department of Medical Genetics, West China Second University Hospital, Sichuan University, Chengdu, China
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, China
- Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
| | - He Wang
- Department of Medical Genetics, West China Second University Hospital, Sichuan University, Chengdu, China
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, China
- Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
| | - Ting Hu
- Department of Medical Genetics, West China Second University Hospital, Sichuan University, Chengdu, China
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, China
- Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
- *Correspondence: Ting Hu,
| |
Collapse
|
2
|
Wang X, Zheng Z, Cai Y, Chen T, Li C, Fu W, Jiang Y. CNVcaller: highly efficient and widely applicable software for detecting copy number variations in large populations. Gigascience 2018; 6:1-12. [PMID: 29220491 PMCID: PMC5751039 DOI: 10.1093/gigascience/gix115] [Citation(s) in RCA: 80] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 11/17/2017] [Indexed: 01/01/2023] Open
Abstract
Background The increasing amount of sequencing data available for a wide variety of species can be theoretically used for detecting copy number variations (CNVs) at the population level. However, the growing sample sizes and the divergent complexity of nonhuman genomes challenge the efficiency and robustness of current human-oriented CNV detection methods. Results Here, we present CNVcaller, a read-depth method for discovering CNVs in population sequencing data. The computational speed of CNVcaller was 1-2 orders of magnitude faster than CNVnator and Genome STRiP for complex genomes with thousands of unmapped scaffolds. CNV detection of 232 goats required only 1.4 days on a single compute node. Additionally, the Mendelian consistency of sheep trios indicated that CNVcaller mitigated the influence of high proportions of gaps and misassembled duplications in the nonhuman reference genome assembly. Furthermore, multiple evaluations using real sheep and human data indicated that CNVcaller achieved the best accuracy and sensitivity for detecting duplications. Conclusions The fast generalized detection algorithms included in CNVcaller overcome prior computational barriers for detecting CNVs in large-scale sequencing data with complex genomic structures. Therefore, CNVcaller promotes population genetic analyses of functional CNVs in more species.
Collapse
Affiliation(s)
- Xihong Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhuqing Zheng
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yudong Cai
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Ting Chen
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chao Li
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Weiwei Fu
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yu Jiang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
3
|
|
4
|
Monlong J, Cossette P, Meloche C, Rouleau G, Girard SL, Bourque G. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res 2018; 46:7236-7249. [PMID: 30137632 PMCID: PMC6101599 DOI: 10.1093/nar/gky538] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 05/04/2018] [Accepted: 06/12/2018] [Indexed: 12/18/2022] Open
Abstract
Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately 5 times more likely to harbor germline CNVs, in stark contrast to the nearly uniform distribution observed for somatic CNVs in 95 cancer genomes. In addition to known enrichments in segmental duplication and near centromeres and telomeres, we also report that CNVs are enriched in specific types of satellite and in some of the most recent families of transposable elements. Finally, using this comprehensive approach, we identify 3455 regions with recurrent CNVs that were missing from existing catalogs. In particular, we identify 347 genes with a novel exonic CNV in low-mappability regions, including 29 genes previously associated with disease.
Collapse
Affiliation(s)
- Jean Monlong
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada
| | - Patrick Cossette
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
| | - Caroline Meloche
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
| | - Guy Rouleau
- Montreal Neurological Institute, McGill University, Montréal H3A 2B4, Canada
| | - Simon L Girard
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
- Département des sciences fondamentales, Université du Québec à Chicoutimi, Chicoutimi G7H 2B1, Canada
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada
- McGill University and Génome Québec Innovation Center, Montréal H3A 1A4, Canada
| |
Collapse
|
5
|
Genovese LM, Geraci F, Corrado L, Mangano E, D'Aurizio R, Bordoni R, Severgnini M, Manzini G, De Bellis G, D'Alfonso S, Pellegrini M. A Census of Tandemly Repeated Polymorphic Loci in Genic Regions Through the Comparative Integration of Human Genome Assemblies. Front Genet 2018; 9:155. [PMID: 29770143 PMCID: PMC5941971 DOI: 10.3389/fgene.2018.00155] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 04/13/2018] [Indexed: 11/29/2022] Open
Abstract
Polymorphic Tandem Repeat (PTR) is a common form of polymorphism in the human genome. A PTR consists in a variation found in an individual (or in a population) of the number of repeating units of a Tandem Repeat (TR) locus of the genome with respect to the reference genome. Several phenotypic traits and diseases have been discovered to be strongly associated with or caused by specific PTR loci. PTR are further distinguished in two main classes: Short Tandem Repeats (STR) when the repeating unit has size up to 6 base pairs, and Variable Number Tandem Repeats (VNTR) for repeating units of size above 6 base pairs. As larger and larger populations are screened via high throughput sequencing projects, it becomes technically feasible and desirable to explore the association between PTR and a panoply of such traits and conditions. In order to facilitate these studies, we have devised a method for compiling catalogs of PTR from assembled genomes, and we have produced a catalog of PTR for genic regions (exons, introns, UTR and adjacent regions) of the human genome (GRCh38). We applied four different TR discovery software tools to uncover in the first phase 55,223,485 TR (after duplicate removal) in GRCh38, of which 373,173 were determined to be PTR in the second phase by comparison with five assembled human genomes. Of these, 263,266 are not included by state-of-the-art PTR catalogs. The new methodology is mainly based on a hierarchical and systematic application of alignment-based sequence comparisons to identify and measure the polymorphism of TR. While previous catalogs focus on the class of STR of small total size, we remove any size restrictions, aiming at the more general class of PTR, and we also target fuzzy TR by using specific detection tools. Similarly to other previous catalogs of human polymorphic loci, we focus our catalog toward applications in the discovery of disease-associated loci. Validation by cross-referencing with existing catalogs on common clinically-relevant loci shows good concordance. Overall, this proposed census of human PTR in genic regions is a shared resource (web accessible), complementary to existing catalogs, facilitating future genome-wide studies involving PTR.
Collapse
Affiliation(s)
| | - Filippo Geraci
- Institute for Informatics and Telematics of CNR, Pisa, Italy
| | - Lucia Corrado
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | | | - Roberta Bordoni
- Institute for Biomedical Technologies of CNR, Segrate, Italy
| | | | - Giovanni Manzini
- Institute for Informatics and Telematics of CNR, Pisa, Italy.,Department of Science and Technological Innovation, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | - Sandra D'Alfonso
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | |
Collapse
|
6
|
Wang W, Wang W, Sun W, Crowley JJ, Szatkiewicz JP. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing. Nucleic Acids Res 2015; 43:e90. [PMID: 25883151 PMCID: PMC4538801 DOI: 10.1093/nar/gkv319] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 03/27/2015] [Indexed: 11/14/2022] Open
Abstract
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/.
Collapse
Affiliation(s)
- WeiBo Wang
- Department of Computer Science, University of North Carolina at Chapel Hill, NC 27599-3175, USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| | - Wei Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599-7400, USA
| | - James J Crowley
- Department of Genetics, University of North Carolina at Chapel Hill, NC 27599-7264, USA
| | - Jin P Szatkiewicz
- Department of Genetics, University of North Carolina at Chapel Hill, NC 27599-7264, USA
| |
Collapse
|
7
|
Gelfand Y, Hernandez Y, Loving J, Benson G. VNTRseek-a computational tool to detect tandem repeat variants in high-throughput sequencing data. Nucleic Acids Res 2014; 42:8884-94. [PMID: 25056320 PMCID: PMC4132751 DOI: 10.1093/nar/gku642] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.
Collapse
Affiliation(s)
- Yevgeniy Gelfand
- Laboratory for Biocomputing and Informatics, Boston University, Boston, MA 02215, USA
| | - Yozen Hernandez
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | - Joshua Loving
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | - Gary Benson
- Laboratory for Biocomputing and Informatics, Boston University, Boston, MA 02215, USA Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA Department of Computer Science, Boston University, Boston, MA 02215, USA
| |
Collapse
|
8
|
Duan J, Zhang JG, Deng HW, Wang YP. CNV-TV: a robust method to discover copy number variation from short sequencing reads. BMC Bioinformatics 2013; 14:150. [PMID: 23634703 PMCID: PMC3679874 DOI: 10.1186/1471-2105-14-150] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Accepted: 04/19/2013] [Indexed: 11/29/2022] Open
Abstract
Background Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data. Results A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project. Conclusion The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| | | | | | | |
Collapse
|
9
|
Brookes K. The VNTR in complex disorders: The forgotten polymorphisms? A functional way forward? Genomics 2013; 101:273-81. [DOI: 10.1016/j.ygeno.2013.03.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2013] [Revised: 03/08/2013] [Accepted: 03/11/2013] [Indexed: 12/16/2022]
|
10
|
Ragheb MN, Ford CB, Chase MR, Lin PL, Flynn JL, Fortune SM. The mutation rate of mycobacterial repetitive unit loci in strains of M. tuberculosis from cynomolgus macaque infection. BMC Genomics 2013; 14:145. [PMID: 23496945 PMCID: PMC3635867 DOI: 10.1186/1471-2164-14-145] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 02/26/2013] [Indexed: 11/25/2022] Open
Abstract
Background Mycobacterial interspersed repetitive units (MIRUs) are minisatellites within the Mycobacterium tuberculosis (Mtb) genome. Copy number variation (CNV) in MIRU loci is used for epidemiological typing, making the rate of variation important for tracking the transmission of Mtb strains. In this study, we developed and assessed a whole-genome sequencing (WGS) approach to detect MIRU CNV in Mtb. We applied this methodology to a panel of Mtb strains isolated from the macaque model of tuberculosis (TB), the animal model that best mimics human disease. From these data, we have estimated the rate of MIRU variation in the host environment, providing a benchmark rate for future epidemiologic work. Results We assessed variation at the 24 MIRU loci used for typing in a set of Mtb strains isolated from infected cynomolgus macaques. We previously performed WGS of these strains and here have applied both read depth (RD) and paired-end mapping (PEM) metrics to identify putative copy number variants. To assess the relative power of these approaches, all MIRU loci were resequenced using Sanger sequencing. We detected two insertion/deletion events both of which could be identified as candidates by PEM criteria. With these data, we estimate a MIRU mutation rate of 2.70 × 10-03 (95% CI: 3.30 × 10-04- 9.80 × 10-03) per locus, per year. Conclusion Our results represent the first experimental estimate of the MIRU mutation rate in Mtb. This rate is comparable to the highest previous estimates gathered from epidemiologic data and meta-analyses. Our findings allow for a more rigorous interpretation of data gathered from MIRU typing.
Collapse
Affiliation(s)
- Mark N Ragheb
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA, USA
| | | | | | | | | | | |
Collapse
|
11
|
Wang Z, Hormozdiari F, Yang WY, Halperin E, Eskin E. CNVeM: copy number variation detection using uncertainty of read mapping. J Comput Biol 2013; 20:224-36. [PMID: 23421794 DOI: 10.1089/cmb.2012.0258] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Copy number variations (CNVs) are widely known to be an important mediator for diseases and traits. The development of high-throughput sequencing (HTS) technologies has provided great opportunities to identify CNV regions in mammalian genomes. In a typical experiment, millions of short reads obtained from a genome of interest are mapped to a reference genome. The mapping information can be used to identify CNV regions. One important challenge in analyzing the mapping information is the large fraction of reads that can be mapped to multiple positions. Most existing methods either only consider reads that can be uniquely mapped to the reference genome or randomly place a read to one of its mapping positions. Therefore, these methods have low power to detect CNVs located within repeated sequences. In this study, we propose a probabilistic model, CNVeM, that utilizes the inherent uncertainty of read mapping. We use maximum likelihood to estimate locations and copy numbers of copied regions and implement an expectation-maximization (EM) algorithm. One important contribution of our model is that we can distinguish between regions in the reference genome that differ from each other by as little as 0.1%. As our model aims to predict the copy number of each nucleotide, we can predict the CNV boundaries with high resolution. We apply our method to simulated datasets and achieve higher accuracy compared to CNVnator. Moreover, we apply our method to real data from which we detected known CNVs. To our knowledge, this is the first attempt to predict CNVs at nucleotide resolution and to utilize uncertainty of read mapping.
Collapse
Affiliation(s)
- Zhanyong Wang
- Computer Science Department, University of California Los Angeles, Los Angeles, CA 90095-1596, USA
| | | | | | | | | |
Collapse
|
12
|
Szatkiewicz JP, Wang W, Sullivan PF, Wang W, Sun W. Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation. Nucleic Acids Res 2012; 41:1519-32. [PMID: 23275535 PMCID: PMC3561969 DOI: 10.1093/nar/gks1363] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth-based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth-based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.
Collapse
Affiliation(s)
- Jin P Szatkiewicz
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599-7264, USA.
| | | | | | | | | |
Collapse
|
13
|
Teo SM, Pawitan Y, Ku CS, Chia KS, Salim A. Statistical challenges associated with detecting copy number variations with next-generation sequencing. ACTA ACUST UNITED AC 2012; 28:2711-8. [PMID: 22942022 DOI: 10.1093/bioinformatics/bts535] [Citation(s) in RCA: 153] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
MOTIVATION Analysing next-generation sequencing (NGS) data for copy number variations (CNVs) detection is a relatively new and challenging field, with no accepted standard protocols or quality control measures so far. There are by now several algorithms developed for each of the four broad methods for CNV detection using NGS, namely the depth of coverage (DOC), read-pair, split-read and assembly-based methods. However, because of the complexity of the genome and the short read lengths from NGS technology, there are still many challenges associated with the analysis of NGS data for CNVs, no matter which method or algorithm is used. RESULTS In this review, we describe and discuss areas of potential biases in CNV detection for each of the four methods. In particular, we focus on issues pertaining to (i) mappability, (ii) GC-content bias, (iii) quality control measures of reads and (iv) difficulty in identifying duplications. To gain insights to some of the issues discussed, we also download real data from the 1000 Genomes Project and analyse its DOC data. We show examples of how reads in repeated regions can affect CNV detection, demonstrate current GC-correction algorithms, investigate sensitivity of DOC algorithm before and after quality control of reads and discuss reasons for which duplications are harder to detect than deletions.
Collapse
Affiliation(s)
- Shu Mei Teo
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore 117597
| | | | | | | | | |
Collapse
|
14
|
Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 2011; 13:36-46. [PMID: 22124482 DOI: 10.1038/nrg3117] [Citation(s) in RCA: 1122] [Impact Index Per Article: 80.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.
Collapse
|