101
|
Young ND, Zhou P, Silverstein KA. Exploring structural variants in environmentally sensitive gene families. CURRENT OPINION IN PLANT BIOLOGY 2016; 30:19-24. [PMID: 26855303 DOI: 10.1016/j.pbi.2015.12.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 12/22/2015] [Accepted: 12/28/2015] [Indexed: 06/05/2023]
Abstract
Environmentally sensitive plant gene families like NBS-LRRs, receptor kinases, defensins and others, are known to be highly variable. However, most existing strategies for discovering and describing structural variation in complex gene families provide incomplete and imperfect results. The move to de novo genome assemblies for multiple accessions or individuals within a species is enabling more comprehensive and accurate insights about gene family variation. Earlier array-based genome hybridization and sequence-based read mapping methods were limited by their reliance on a reference genome and by misplacement of paralogous sequences. Variant discovery based on de novo genome assemblies overcome the problems arising from a reference genome and reduce sequence misplacement. As de novo genome sequencing moves to the use of longer reads, artifacts will be minimized, intact tandem gene clusters will be constructed accurately, and insights into rapid evolution will become feasible.
Collapse
Affiliation(s)
- Nevin Dale Young
- Department of Plant Pathology, 495 Borlaug Hall, University of Minnesota, St. Paul, MN 55108, USA; Department of Plant Biology, 220 BioScience Building, University of Minnesota, St. Paul, MN 55108, USA.
| | - Peng Zhou
- Department of Plant Pathology, 495 Borlaug Hall, University of Minnesota, St. Paul, MN 55108, USA; Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN 55455, USA
| | - Kevin At Silverstein
- Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
102
|
One CNV Discordance in NRXN1 Observed Upon Genome-wide Screening in 38 Pairs of Adult Healthy Monozygotic Twins. Twin Res Hum Genet 2016; 19:97-103. [DOI: 10.1017/thg.2016.5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Monozygotic (MZ) twins stem from the same single fertilized egg and therefore share all their inherited genetic variation. This is one of the unequivocal facts on which genetic epidemiology and twin studies are based. To what extent this also implies that MZ twins share genotypes in adult tissues is not precisely established, but a common pragmatic assumption is that MZ twins are 100% genetically identical also in adult tissues. During the past decade, this view has been challenged by several reports, with observations of differences in post-zygotic copy number variations (CNVs) between members of the same MZ pair. In this study, we performed a systematic search for differences of CNVs within 38 adult MZ pairs who had been misclassified as dizygotic (DZ) twins by questionnaire-based assessment. Initial scoring by PennCNV suggested a total of 967 CNV discordances. The within-pair correlation in number of CNVs detected was strongly dependent on confidence score filtering and reached a plateau of r = 0.8 when restricting to CNVs detected with confidence score larger than 50. The top-ranked discordances were subsequently selected for validation by quantitative polymerase chain reaction (qPCR), from which one single ~120kb deletion in NRXN1 on chromosome 2 (bp 51017111–51136802) was validated. Despite involving an exon, no sign of cognitive/mental consequences was apparent in the affected twin pair, potentially reflecting limited or lack of expression of the transcripts containing this exon in nerve/brain.
Collapse
|
103
|
García-Chequer AJ, Méndez-Tenorio A, Olguín-Ruiz G, Sánchez-Vallejo C, Isa P, Arias CF, Torres J, Hernández-Angeles A, Ramírez-Ortiz MA, Lara C, Cabrera-Muñoz ML, Sadowinski-Pine S, Bravo-Ortiz JC, Ramón-García G, Diegopérez-Ramírez J, Ramírez-Reyes G, Casarrubias-Islas R, Ramírez J, Orjuela MA, Ponce-Castañeda MV. Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing. Cancer Genet 2015; 209:57-69. [PMID: 26883451 DOI: 10.1016/j.cancergen.2015.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 09/01/2015] [Accepted: 12/03/2015] [Indexed: 12/12/2022]
Abstract
Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development.
Collapse
Affiliation(s)
- A J García-Chequer
- Unidad de Investigación Médica en Enfermedades Infecciosas, Centro Médico Nacional SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - A Méndez-Tenorio
- Lab. Bioinformática Genómica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, México D.F., Mexico
| | - G Olguín-Ruiz
- Lab. Bioinformática Genómica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, México D.F., Mexico
| | - C Sánchez-Vallejo
- Lab. Bioinformática Genómica, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, México D.F., Mexico
| | - P Isa
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - C F Arias
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - J Torres
- Unidad de Investigación Médica en Enfermedades Infecciosas, Centro Médico Nacional SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - A Hernández-Angeles
- Unidad de Investigación Médica en Enfermedades Infecciosas, Centro Médico Nacional SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | | | - C Lara
- Hospital Infantil de México Federico Gómez, México D.F., Mexico
| | | | | | - J C Bravo-Ortiz
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - G Ramón-García
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - J Diegopérez-Ramírez
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - G Ramírez-Reyes
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - R Casarrubias-Islas
- Hospital de Pediatría, CMN SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico
| | - J Ramírez
- Unidad de Microarreglos, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, México D.F., Mexico
| | | | - M V Ponce-Castañeda
- Unidad de Investigación Médica en Enfermedades Infecciosas, Centro Médico Nacional SXXI, Instituto Mexicano del Seguro Social, México D.F., Mexico.
| |
Collapse
|
104
|
Liu Y, Liu J, Lu J, Peng J, Juan L, Zhu X, Li B, Wang Y. Joint detection of copy number variations in parent-offspring trios. Bioinformatics 2015; 32:1130-7. [PMID: 26644415 DOI: 10.1093/bioinformatics/btv707] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 11/27/2015] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Whole genome sequencing (WGS) of parent-offspring trios is a powerful approach for identifying disease-associated genes via detecting copy number variations (CNVs). Existing approaches, which detect CNVs for each individual in a trio independently, usually yield low-detection accuracy. Joint modeling approaches leveraging Mendelian transmission within the parent-offspring trio can be an efficient strategy to improve CNV detection accuracy. RESULTS In this study, we developed TrioCNV, a novel approach for jointly detecting CNVs in parent-offspring trios from WGS data. Using negative binomial regression, we modeled the read depth signal while considering both GC content bias and mappability bias. Moreover, we incorporated the family relationship and used a hidden Markov model to jointly infer CNVs for three samples of a parent-offspring trio. Through application to both simulated data and a trio from 1000 Genomes Project, we showed that TrioCNV achieved superior performance than existing approaches. AVAILABILITY AND IMPLEMENTATION The software TrioCNV implemented using a combination of Java and R is freely available from the website at https://github.com/yongzhuang/TrioCNV CONTACT: ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jian Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jianguo Lu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jiajie Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Liran Juan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaolin Zhu
- Institute for Genomic Medicine, Columbia University, New York, NY 10032, University Program in Genetics and Genomics, Duke University Medical School, Durham, NC 27708
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235 and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
105
|
Huse JT, Rosenblum MK. The Emerging Molecular Foundations of Pediatric Brain Tumors. J Child Neurol 2015; 30:1838-50. [PMID: 25873586 DOI: 10.1177/0883073815579709] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 03/10/2015] [Indexed: 01/23/2023]
Abstract
Recent years have witnessed extensive molecular characterization of several pediatric brain tumor variants. These studies have dramatically shifted notions of disease classification and are likely to have similarly profound effects on patient management in the near future. In this review, we cover the molecular foundations of low-grade glial and glioneuronal neoplasms, high-grade glioma, ependymoma, and medulloblastoma, the details of which have only been recently elucidated in many cases. In doing so, we describe an array of biomarkers likely to play a major role in clinically relevant molecular stratification moving forward. We also discuss strategies for robust and efficient biomarker assessment in the clinical environment.
Collapse
Affiliation(s)
- Jason T Huse
- Department of Pathology and Memorial Sloan-Kettering Cancer Center, New York, NY, USA Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Marc K Rosenblum
- Department of Pathology and Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
106
|
Liu Y, Li A, Feng H, Wang M. TAFFYS: An Integrated Tool for Comprehensive Analysis of Genomic Aberrations in Tumor Samples. PLoS One 2015; 10:e0129835. [PMID: 26111017 PMCID: PMC4482394 DOI: 10.1371/journal.pone.0129835] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2014] [Accepted: 05/13/2015] [Indexed: 01/13/2023] Open
Abstract
Background Tumor single nucleotide polymorphism (SNP) array is a common platform for investigating the cancer genomic aberration and the functionally important altered genes. Original SNP array signals are usually corrupted by noise, and need to be de-convoluted into absolute copy number profile by analytical methods. Unfortunately, in contrast with the popularity of tumor Affymetrix SNP array, the methods that are specifically designed for this platform are still limited. The complicated characteristics of noise in signals is one of the difficulties for dissecting tumor Affymetrix SNP array data, as they inevitably blur the distinction between aberrations and create an obstacle for the copy number aberration (CNA) identification. Results We propose a tool named TAFFYS for comprehensive analysis of tumor Affymetrix SNP array data. TAFFYS introduce a wavelet-based de-noising approach and copy number-specific signal variance model for suppressing and modelling the noise in signals. Then a hidden Markov model is employed for copy number inference. Finally, by using the absolute copy number profile, statistical significance of each aberration region is calculated in term of different aberration types, including amplification, deletion and loss of heterozygosity (LOH). The result shows that copy number specific-variance model and wavelet de-noising algorithm fits well with the Affymetrix SNP array signals, leading to more accurate estimation for diluted tumor sample (even with only 30% of cancer cells) than other existed methods. Results of examinations also demonstrate a good compatibility and extensibility for different Affymetrix SNP array platforms. Application on the 35 breast tumor samples shows that TAFFYS can automatically dissect the tumor samples and reveal statistically significant aberration regions where cancer-related genes locate. Conclusions TAFFYS provide an efficient and convenient tool for identifying the copy number alteration and allelic imbalance and assessing the recurrent aberrations for the tumor Affymetrix SNP array data.
Collapse
Affiliation(s)
- Yuanning Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
- Research centres for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China
- * E-mail:
| | - Huanqing Feng
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
- Research centres for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China
| |
Collapse
|
107
|
Tattini L, D'Aurizio R, Magi A. Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Front Bioeng Biotechnol 2015; 3:92. [PMID: 26161383 PMCID: PMC4479793 DOI: 10.3389/fbioe.2015.00092] [Citation(s) in RCA: 169] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 06/10/2015] [Indexed: 01/16/2023] Open
Abstract
Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events.
Collapse
Affiliation(s)
- Lorenzo Tattini
- Department of Neurosciences, Psychology, Pharmacology and Child Health, University of Florence , Florence , Italy
| | - Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council , Pisa , Italy
| | - Alberto Magi
- Department of Clinical and Experimental Medicine, University of Florence , Florence , Italy
| |
Collapse
|
108
|
Combined Analysis of SNP Array Data Identifies Novel CNV Candidates and Pathways in Ependymoma and Mesothelioma. BIOMED RESEARCH INTERNATIONAL 2015; 2015:902419. [PMID: 26185765 PMCID: PMC4491549 DOI: 10.1155/2015/902419] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 05/26/2015] [Indexed: 01/21/2023]
Abstract
Copy number variation is a class of structural genomic modifications that includes the gain and loss of a specific genomic region, which may include an entire gene. Many studies have used low-resolution techniques to identify regions that are frequently lost or amplified in cancer. Usually, researchers choose to use proprietary or non-open-source software to detect these regions because the graphical interface tends to be easier to use. In this study, we combined two different open-source packages into an innovative strategy to identify novel copy number variations and pathways associated with cancer. We used a mesothelioma and ependymoma published datasets to assess our tool. We detected previously described and novel copy number variations that are associated with cancer chemotherapy resistance. We also identified altered pathways associated with these diseases, like cell adhesion in patients with mesothelioma and negative regulation of glutamatergic synaptic transmission in ependymoma patients. In conclusion, we present a novel strategy using open-source software to identify copy number variations and altered pathways associated with cancer.
Collapse
|
109
|
Hehir-Kwa JY, Pfundt R, Veltman JA. Exome sequencing and whole genome sequencing for the detection of copy number variation. Expert Rev Mol Diagn 2015; 15:1023-32. [PMID: 26088785 DOI: 10.1586/14737159.2015.1053467] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Many laboratories now use genomic microarrays as their first-tier diagnostic test for copy number variation (CNV) detection. In addition, whole exome sequencing is increasingly being offered as a diagnostic test for heterogeneous disorders. Although mostly used for the detection of point mutations and small insertion-deletions, exome sequencing can also be used to call CNVs, allowing combined small and large variant analysis. Whole genome sequencing in addition to these advantages also offers the potential to characterize CNVs to unprecedented levels of accuracy, providing position and orientation information. In this review, we discuss the clinical potential of CNV identification in whole exome sequencing and whole genome sequencing data and the implications this has on diagnostic laboratories.
Collapse
Affiliation(s)
- Jayne Y Hehir-Kwa
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | | |
Collapse
|
110
|
Xia H, Liu Y, Wang M, Li A. Identification of Genomic Aberrations in Cancer Subclones from Heterogeneous Tumor Samples. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:679-685. [PMID: 26357278 DOI: 10.1109/tcbb.2014.2366114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Tumor samples are usually heterogeneous, containing admixture of more than one kind of tumor subclones. Studies of genomic aberrations from heterogeneous tumor data are hindered by the mixed signal of tumor subclone cells. Most of the existing algorithms cannot distinguish contributions of different subclones from the measured single nucleotide polymorphism (SNP) array signals, which may cause erroneous estimation of genomic aberrations. Here, we have introduced a computational method, Cancer Heterogeneity Analysis from SNP-array Experiments (CHASE), to automatically detect subclone proportions and genomic aberrations from heterogeneous tumor samples. Our method is based on HMM, and incorporates EM algorithm to build a statistical model for modeling mixed signal of multiple tumor subclones. We tested the proposed approach on simulated datasets and two real datasets, and the results show that the proposed method can efficiently estimate tumor subclone proportions and recovery the genomic aberrations.
Collapse
|
111
|
Identifying Human Genome-Wide CNV, LOH and UPD by Targeted Sequencing of Selected Regions. PLoS One 2015; 10:e0123081. [PMID: 25919136 PMCID: PMC4412667 DOI: 10.1371/journal.pone.0123081] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Accepted: 02/27/2015] [Indexed: 01/03/2023] Open
Abstract
Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information.
Collapse
|
112
|
Pirooznia M, Goes FS, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet 2015; 6:138. [PMID: 25918519 PMCID: PMC4394692 DOI: 10.3389/fgene.2015.00138] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 03/23/2015] [Indexed: 01/04/2023] Open
Abstract
Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development.
Collapse
Affiliation(s)
- Mehdi Pirooznia
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Fernando S Goes
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Peter P Zandi
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA ; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health Baltimore, MD, USA USA
| |
Collapse
|
113
|
Noureen A, Fresser F, Utermann G, Schmidt K. Sequence variation within the KIV-2 copy number polymorphism of the human LPA gene in African, Asian, and European populations. PLoS One 2015; 10:e0121582. [PMID: 25822457 PMCID: PMC4378929 DOI: 10.1371/journal.pone.0121582] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Accepted: 02/13/2015] [Indexed: 11/18/2022] Open
Abstract
Amazingly little sequence variation is reported for the kringle IV 2 copy number variation (KIV 2 CNV) in the human LPA gene. Apart from whole genome sequencing projects, this region has only been analyzed in some detail in samples of European populations. We have performed a systematic resequencing study of the exonic and flanking intron regions within the KIV 2 CNV in 90 alleles from Asian, European, and four different African populations. Alleles have been separated according to their CNV length by pulsed field gel electrophoresis prior to unbiased specific PCR amplification of the target regions. These amplicons covered all KIV 2 copies of an individual allele simultaneously. In addition, cloned amplicons from genomic DNA of an African individual were sequenced. Our data suggest that sequence variation in this genomic region may be higher than previously appreciated. Detection probability of variants appeared to depend on the KIV 2 copy number of the analyzed DNA and on the proportion of copies carrying the variant. Asians had a high frequency of so-called KIV 2 type B and type C (together 70% of alleles), which differ by three or two synonymous substitutions respectively from the reference type A. This is most likely explained by the strong bottleneck suggested to have occurred when modern humans migrated to East Asia. A higher frequency of variable sites was detected in the Africans. In particular, two previously unreported splice site variants were found. One was associated with non-detectable Lp(a). The other was observed at high population frequencies (10% to 40%). Like the KIV 2 type B and C variants, this latter variant was also found in a high proportion of KIV 2 repeats in the affected alleles and in alleles differing in copy numbers. Our findings may have implications for the interpretation of SNP analyses in other repetitive loci of the human genome.
Collapse
Affiliation(s)
- Asma Noureen
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
- Division of Human Genetics, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
| | - Friedrich Fresser
- Division of Human Genetics, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
- Division of Translational Cell Genetics, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
| | - Gerd Utermann
- Division of Human Genetics, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
| | - Konrad Schmidt
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
- Division of Human Genetics, Department of Medical Genetics, Molecular and Clinical Pharmacology, Innsbruck Medical University, Innsbruck, Austria
- Centre de Recherches Médicales de Lambaréné, Albert Schweitzer Hospital, Lambaréné, Gabon
- Department for Tropical Medicine, Eberhard-Karls-University Tübingen, Tübingen, Germany
- * E-mail:
| |
Collapse
|
114
|
Comparison of sequencing based CNV discovery methods using monozygotic twin quartets. PLoS One 2015; 10:e0122287. [PMID: 25812131 PMCID: PMC4374778 DOI: 10.1371/journal.pone.0122287] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 02/11/2015] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The advent of high throughput sequencing methods breeds an important amount of technical challenges. Among those is the one raised by the discovery of copy-number variations (CNVs) using whole-genome sequencing data. CNVs are genomic structural variations defined as a variation in the number of copies of a large genomic fragment, usually more than one kilobase. Here, we aim to compare different CNV calling methods in order to assess their ability to consistently identify CNVs by comparison of the calls in 9 quartets of identical twin pairs. The use of monozygotic twins provides a means of estimating the error rate of each algorithm by observing CNVs that are inconsistently called when considering the rules of Mendelian inheritance and the assumption of an identical genome between twins. The similarity between the calls from the different tools and the advantage of combining call sets were also considered. RESULTS ERDS and CNVnator obtained the best performance when considering the inherited CNV rate with a mean of 0.74 and 0.70, respectively. Venn diagrams were generated to show the agreement between the different algorithms, before and after filtering out familial inconsistencies. This filtering revealed a high number of false positives for CNVer and Breakdancer. A low overall agreement between the methods suggested a high complementarity of the different tools when calling CNVs. The breakpoint sensitivity analysis indicated that CNVnator and ERDS achieved better resolution of CNV borders than the other tools. The highest inherited CNV rate was achieved through the intersection of these two tools (81%). CONCLUSIONS This study showed that ERDS and CNVnator provide good performance on whole genome sequencing data with respect to CNV consistency across families, CNV breakpoint resolution and CNV call specificity. The intersection of the calls from the two tools would be valuable for CNV genotyping pipelines.
Collapse
|
115
|
Yadav SS, Li J, Lavery HJ, Yadav KK, Tewari AK. Next-generation sequencing technology in prostate cancer diagnosis, prognosis, and personalized treatment. Urol Oncol 2015; 33:267.e1-13. [PMID: 25791755 DOI: 10.1016/j.urolonc.2015.02.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 02/11/2015] [Accepted: 02/12/2015] [Indexed: 02/06/2023]
Abstract
Next-generation sequencing (NGS) of the genetic information of cancer cells has revolutionized the field of cancer biology, including prostate cancer (PCa). New recurrent alterations have been identified in PCa (e.g., TMPRSS2-ERG translocation, SPOP and CHD1 mutations, and chromoplexy), and many previous ones in well-established pathways have been validated (e.g., androgen receptor overexpression and mutations; PTEN, RB1, and TP53 loss/mutations). With its highly heterogeneous nature, PCa continues to pose a tremendous challenge in terms of diagnosis and prognosis. Combining the information gained through NGS studies with clinicopathological and radiological data will help diagnose the aggressiveness of the cancer with greater accuracy. Furthermore, understanding the heterogeneity of tumor through single-cell or single-molecule sequencing technology will also strengthen the prognosis and provide better, patient-specific drug identification. As this research becomes more prominent, it is important that urologic oncologists become familiar with the various NGS technologies and the results generated using them. We highlight the commonly used NGS tools and summarize recent discoveries relevant to PCa.
Collapse
Affiliation(s)
- Shalini S Yadav
- Department of Urology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY
| | - Jinyi Li
- Department of Urology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY
| | - Hugh J Lavery
- Department of Urology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY
| | - Kamlesh K Yadav
- Department of Urology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY.
| | - Ashutosh K Tewari
- Department of Urology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY.
| |
Collapse
|
116
|
Glusman G, Severson A, Dhankani V, Robinson M, Farrah T, Mauldin DE, Stittrich AB, Ament SA, Roach JC, Brunkow ME, Bodian DL, Vockley JG, Shmulevich I, Niederhuber JE, Hood L. Identification of copy number variants in whole-genome data using Reference Coverage Profiles. Front Genet 2015; 6:45. [PMID: 25741365 PMCID: PMC4330915 DOI: 10.3389/fgene.2015.00045] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 01/30/2015] [Indexed: 12/20/2022] Open
Abstract
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Joseph G Vockley
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | | | - John E Niederhuber
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Leroy Hood
- Institute for Systems Biology Seattle, WA, USA
| |
Collapse
|
117
|
Yi G, Qu L, Chen S, Xu G, Yang N. Genome-wide copy number profiling using high-density SNP array in chickens. Anim Genet 2015; 46:148-57. [PMID: 25662183 DOI: 10.1111/age.12267] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/03/2014] [Indexed: 01/04/2023]
Abstract
Phenotypic diversity is a direct consequence resulting mainly from the impact of underlying genetic variation, and recent studies have shown that copy number variation (CNV) is emerging as an important contributor to both phenotypic variability and disease susceptibility. Herein, we performed a genome-wide CNV scan in 96 chickens from 12 diversified breeds, benefiting from the high-density Affymetrix 600 K SNP arrays. We identified a total of 231 autosomal CNV regions (CNVRs) encompassing 5.41 Mb of the chicken genome and corresponding to 0.59% of the autosomal sequence. The length of these CNVRs ranged from 2.6 to 586.2 kb with an average of 23.4 kb, including 130 gain, 93 loss and eight both gain and loss events. These CNVRs, especially deletions, had lower GC content and were located particularly in gene deserts. In particular, 102 CNVRs harbored 128 chicken genes, most of which were enriched in immune responses. We obtained 221 autosomal CNVRs after converting probe coordinates to Galgal3, and comparative analysis with previous studies illustrated that 153 of these CNVRs were regarded as novel events. Furthermore, qPCR assays were designed for 11 novel CNVRs, and eight (72.73%) were validated successfully. In this study, we demonstrated that the high-density 600 K SNP array can capture CNVs with higher efficiency and accuracy and highlighted the necessity of integrating multiple technologies and algorithms. Our findings provide a pioneering exploration of chicken CNVs based on a high-density SNP array, which contributes to a more comprehensive understanding of genetic variation in the chicken genome and is beneficial to unearthing potential CNVs underlying important traits of chickens.
Collapse
Affiliation(s)
- G Yi
- National Engineering Laboratory for Animal Breeding and MOA Key Laboratory of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | | | | | | | | |
Collapse
|
118
|
Reinecke F, Satya RV, DiCarlo J. Quantitative analysis of differences in copy numbers using read depth obtained from PCR-enriched samples and controls. BMC Bioinformatics 2015; 16:17. [PMID: 25626454 PMCID: PMC4384318 DOI: 10.1186/s12859-014-0428-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 12/11/2014] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) is rapidly becoming common practice in clinical diagnostics and cancer research. In addition to the detection of single nucleotide variants (SNVs), information on copy number variants (CNVs) is of great interest. Several algorithms exist to detect CNVs by analyzing whole genome sequencing data or data from samples enriched by hybridization-capture. PCR-enriched amplicon-sequencing data have special characteristics that have been taken into account by only one publicly available algorithm so far. RESULTS We describe a new algorithm named quandico to detect copy number differences based on NGS data generated following PCR-enrichment. A weighted t-test statistic was applied to calculate probabilities (p-values) of copy number changes. We assessed the performance of the method using sequencing reads generated from reference DNA with known CNVs, and we were able to detect these variants with 98.6% sensitivity and 98.5% specificity which is significantly better than another recently described method for amplicon sequencing. The source code (R-package) of quandico is licensed under the GPLv3 and it is available at https://github.com/reineckef/quandico . CONCLUSION We demonstrated that our new algorithm is suitable to call copy number changes using data from PCR-enriched samples with high sensitivity and specificity even for single copy differences.
Collapse
Affiliation(s)
- Frank Reinecke
- Bioinformatics Assay Design & Analysis, QIAGEN GmbH, Max-Volmer-Straße 4, Hilden, 40724, Germany.
| | - Ravi Vijaya Satya
- Bioinformatics Assay Design & Analysis, QIAGEN Sciences Inc., 6951 Executive Way, Frederick MD, 21703, USA.
| | - John DiCarlo
- Bioinformatics Assay Design & Analysis, QIAGEN Sciences Inc., 6951 Executive Way, Frederick MD, 21703, USA.
| |
Collapse
|
119
|
Yang JF, Ding XF, Chen L, Mat WK, Xu MZ, Chen JF, Wang JM, Xu L, Poon WS, Kwong A, Leung GKK, Tan TC, Yu CH, Ke YB, Xu XY, Ke XY, Ma RC, Chan JC, Wan WQ, Zhang LW, Kumar Y, Tsang SY, Li S, Wang HY, Xue H. Copy number variation analysis based on AluScan sequences. J Clin Bioinforma 2014; 4:15. [PMID: 25558350 PMCID: PMC4273479 DOI: 10.1186/s13336-014-0015-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 11/12/2014] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND AluScan combines inter-Alu PCR using multiple Alu-based primers with opposite orientations and next-generation sequencing to capture a huge number of Alu-proximal genomic sequences for investigation. Its requirement of only sub-microgram quantities of DNA facilitates the examination of large numbers of samples. However, the special features of AluScan data rendered difficult the calling of copy number variation (CNV) directly using the calling algorithms designed for whole genome sequencing (WGS) or exome sequencing. RESULTS In this study, an AluScanCNV package has been assembled for efficient CNV calling from AluScan sequencing data employing a Geary-Hinkley transformation (GHT) of read-depth ratios between either paired test-control samples, or between test samples and a reference template constructed from reference samples, to call the localized CNVs, followed by use of a GISTIC-like algorithm to identify recurrent CNVs and circular binary segmentation (CBS) to reveal large extended CNVs. To evaluate the utility of CNVs called from AluScan data, the AluScans from 23 non-cancer and 38 cancer genomes were analyzed in this study. The glioma samples analyzed yielded the familiar extended copy-number losses on chromosomes 1p and 9. Also, the recurrent somatic CNVs identified from liver cancer samples were similar to those reported for liver cancer WGS with respect to a striking enrichment of copy-number gains in chromosomes 1q and 8q. When localized or recurrent CNV-features capable of distinguishing between liver and non-liver cancer samples were selected by correlation-based machine learning, a highly accurate separation of the liver and non-liver cancer classes was attained. CONCLUSIONS The results obtained from non-cancer and cancerous tissues indicated that the AluScanCNV package can be employed to call localized, recurrent and extended CNVs from AluScan sequences. Moreover, both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers. Since the method is applicable to any human DNA sample with or without the availability of a paired control, it can also be employed to analyze the constitutional CNVs of individuals.
Collapse
Affiliation(s)
- Jian-Feng Yang
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Xiao-Fan Ding
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Lei Chen
- National Center for Liver Cancer Research and Eastern Hepatobiliary Surgery Hospital, 225 Changhai Road, Shanghai, 200438 China
| | - Wai-Kin Mat
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Michelle Zhi Xu
- Department of Oncology, Nanjing First Hospital, No. 68 Changle Road, Nanjing, 210006 China
| | - Jin-Fei Chen
- Department of Oncology, Nanjing First Hospital, No. 68 Changle Road, Nanjing, 210006 China
| | - Jian-Min Wang
- Department of Hematology, Changhai Hospital, Second Military Medical University, 174 Changhai Road, Shanghai, 200433 China
| | - Lin Xu
- Department of Thoracic Surgery, Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Nanjing Medical University Affiliated Cancer Hospital, Cancer Institute of Jiangsu Province, Baiziting 42, Nanjing, 210009 China
| | - Wai-Sang Poon
- Division of Neurosurgery, Department of Surgery, Prince of Wales Hospital, Chinese University of Hong Kong, 30-32 Ngan Shing Street, Sha Tin, Hong Kong, China
| | - Ava Kwong
- Division of Neurosurgery, Department of Surgery, Li Ka Shing Faculty of Medicine, University of Hong Kong, Queen Mary Hospital, 102 Pokfulam Road, Hong Kong, China
| | - Gilberto Ka-Kit Leung
- Division of Neurosurgery, Department of Surgery, Li Ka Shing Faculty of Medicine, University of Hong Kong, Queen Mary Hospital, 102 Pokfulam Road, Hong Kong, China
| | - Tze-Ching Tan
- Department of Neurosurgery, Queen Elizabeth Hospital, 30 Gascoigne Road, Kowloon, Hong Kong, China
| | - Chi-Hung Yu
- Department of Neurosurgery, Queen Elizabeth Hospital, 30 Gascoigne Road, Kowloon, Hong Kong, China
| | - Yue-Bin Ke
- Shenzhen Center for Disease Control and Prevention, No 8 Longyuan Road, Nanshan district, Shenzhen City, 518055 China
| | - Xin-Yun Xu
- Shenzhen Center for Disease Control and Prevention, No 8 Longyuan Road, Nanshan district, Shenzhen City, 518055 China
| | - Xiao-Yan Ke
- Nanjing Brain Hospital and Nanjing Institute of Neuropsychiatry, Nanjing Medical University, Nanjing, 210029 China
| | - Ronald Cw Ma
- Department of Medicine and Therapeutics, 9th floor, Clinical Sciences Building, The Prince of Wales Hospital, Shatin, Hong Kong
| | - Juliana Cn Chan
- Department of Medicine and Therapeutics, 9th floor, Clinical Sciences Building, The Prince of Wales Hospital, Shatin, Hong Kong
| | - Wei-Qing Wan
- Department of Neurosurgery, Beijing Tiantan Hospital, 6 Tiantan Xili, Dongcheng District, Capital Medical University, Beijing, 100050 China
| | - Li-Wei Zhang
- Department of Neurosurgery, Beijing Tiantan Hospital, 6 Tiantan Xili, Dongcheng District, Capital Medical University, Beijing, 100050 China
| | - Yogesh Kumar
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Shui-Ying Tsang
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Shao Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084 China
| | - Hong-Yang Wang
- National Center for Liver Cancer Research and Eastern Hepatobiliary Surgery Hospital, 225 Changhai Road, Shanghai, 200438 China.,International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, 225 Changhai Road, Shanghai, 200438 China
| | - Hong Xue
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
120
|
Zeng P, Zhao Y, Qian C, Zhang L, Zhang R, Gou J, Liu J, Liu L, Chen F. Statistical analysis for genome-wide association study. J Biomed Res 2014; 29:285-97. [PMID: 26243515 PMCID: PMC4547377 DOI: 10.7555/jbr.29.20140007] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/07/2014] [Accepted: 09/27/2014] [Indexed: 12/19/2022] Open
Abstract
In the past few years, genome-wide association study (GWAS) has made great successes in identifying genetic susceptibility loci underlying many complex diseases and traits. The findings provide important genetic insights into understanding pathogenesis of diseases. In this paper, we present an overview of widely used approaches and strategies for analysis of GWAS, offered a general consideration to deal with GWAS data. The issues regarding data quality control, population structure, association analysis, multiple comparison and visual presentation of GWAS results are discussed; other advanced topics including the issue of missing heritability, meta-analysis, set-based association analysis, copy number variation analysis and GWAS cohort analysis are also briefly introduced.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.,Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu 221004, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Cheng Qian
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Ruyang Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jianwei Gou
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jin Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liya Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.
| |
Collapse
|
121
|
Yi G, Qu L, Liu J, Yan Y, Xu G, Yang N. Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing. BMC Genomics 2014; 15:962. [PMID: 25378104 PMCID: PMC4239369 DOI: 10.1186/1471-2164-15-962] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 10/13/2014] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. RESULTS A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. CONCLUSIONS Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.
Collapse
Affiliation(s)
| | | | | | | | | | - Ning Yang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
122
|
DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res 2014; 24:2022-32. [PMID: 25236618 PMCID: PMC4248318 DOI: 10.1101/gr.175141.114] [Citation(s) in RCA: 350] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ∼0.1× genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost.
Collapse
|
123
|
Duvaux L, Geissmann Q, Gharbi K, Zhou JJ, Ferrari J, Smadja CM, Butlin RK. Dynamics of copy number variation in host races of the pea aphid. Mol Biol Evol 2014; 32:63-80. [PMID: 25234705 PMCID: PMC4271520 DOI: 10.1093/molbev/msu266] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Copy number variation (CNV) makes a major contribution to overall genetic variation and is suspected to play an important role in adaptation. However, aside from a few model species, the extent of CNV in natural populations has seldom been investigated. Here, we report on CNV in the pea aphid Acyrthosiphon pisum, a powerful system for studying the genetic architecture of host-plant adaptation and speciation thanks to multiple host races forming a continuum of genetic divergence. Recent studies have highlighted the potential importance of chemosensory genes, including the gustatory and olfactory receptor gene families (Gr and Or, respectively), in the process of host race formation. We used targeted resequencing to achieve a very high depth of coverage, and thereby revealed the extent of CNV of 434 genes, including 150 chemosensory genes, in 104 individuals distributed across eight host races of the pea aphid. We found that CNV was widespread in our global sample, with a significantly higher occurrence in multigene families, especially in Ors. We also observed a decrease in the gene probability of being completely duplicated or deleted (CDD) with increase in coding sequence length. Genes with CDD variants were usually more polymorphic for copy number, especially in the P450 gene family where toxin resistance may be related to gene dosage. We found that Gr were overrepresented among genes discriminating host races, as were CDD genes and pseudogenes. Our observations shed new light on CNV dynamics and are consistent with CNV playing a role in both local adaptation and speciation.
Collapse
Affiliation(s)
- Ludovic Duvaux
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Quentin Geissmann
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Karim Gharbi
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh, Edinburgh, United Kingdom
| | - Jing-Jiang Zhou
- Department of Biological Chemistry and Crop Protection, Rothamsted Research, Harpenden, United Kingdom
| | - Julia Ferrari
- Department of Biology, University of York, York, United Kingdom
| | - Carole M Smadja
- Institut des Sciences de l'Evolution (UMR 5554), CNRS, IRD, Université Montpellier 2, Montpellier, France
| | - Roger K Butlin
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom Sven Lovén Centre for Marine Sciences-Tjärnö, University of Gothenburg, Strömstad, Sweden
| |
Collapse
|
124
|
Liu B, Morrison CD, Johnson CS, Trump DL, Qin M, Conroy JC, Wang J, Liu S. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges. Oncotarget 2014; 4:1868-81. [PMID: 24240121 PMCID: PMC3875755 DOI: 10.18632/oncotarget.1537] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections.
Collapse
Affiliation(s)
- Biao Liu
- Center for Personalized Medicine, Roswell Park Cancer Institute, Buffalo, NY
| | | | | | | | | | | | | | | |
Collapse
|
125
|
Kadalayil L, Rafiq S, Rose-Zerilli MJJ, Pengelly RJ, Parker H, Oscier D, Strefford JC, Tapper WJ, Gibson J, Ennis S, Collins A. Exome sequence read depth methods for identifying copy number changes. Brief Bioinform 2014; 16:380-92. [PMID: 25169955 DOI: 10.1093/bib/bbu027] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2014] [Accepted: 07/10/2014] [Indexed: 01/04/2023] Open
Abstract
Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.
Collapse
|
126
|
Improved molecular diagnosis by the detection of exonic deletions with target gene capture and deep sequencing. Genet Med 2014; 17:99-107. [PMID: 25032985 PMCID: PMC4338802 DOI: 10.1038/gim.2014.80] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 05/29/2014] [Indexed: 12/25/2022] Open
Abstract
Purpose: We aimed to demonstrate the detection of exonic deletions using target capture and deep sequencing data. Methods: Sequence data from target gene capture followed by massively parallel sequencing were analyzed for the detection of exonic deletions using the normalized mean coverage of individual exons. We compared the results with those obtained from high-density exon-targeted array comparative genomic hybridization and applied similar analysis to examine samples from patients with pathogenic exonic deletions. Results: Thirty-eight samples, each containing 2,134, 2,833, or 4,688 coding exons from different panels, with a total of 103,863 exons, were analyzed by capture–massively parallel sequencing and array comparative genomic hybridization. Ten deletions detected by array comparative genomic hybridization were all detected by massively parallel sequencing, whereas only two of three duplications were detected. We were able to detect all pathogenic exonic deletions in 11 positive cases. Thirty-one exonic copy number changes from nine perspective clinical samples were also identified. Conclusion: Our results demonstrated the feasibility of using the same set of sequence data to detect both point mutations and exonic deletions, thus improving the diagnostic power of massively parallel sequencing–based assays.
Collapse
|
127
|
Tan YT, McPherson GE, Peretz I, Berkovic SF, Wilson SJ. The genetic basis of music ability. Front Psychol 2014; 5:658. [PMID: 25018744 PMCID: PMC4073543 DOI: 10.3389/fpsyg.2014.00658] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Accepted: 06/08/2014] [Indexed: 01/18/2023] Open
Abstract
Music is an integral part of the cultural heritage of all known human societies, with the capacity for music perception and production present in most people. Researchers generally agree that both genetic and environmental factors contribute to the broader realization of music ability, with the degree of music aptitude varying, not only from individual to individual, but across various components of music ability within the same individual. While environmental factors influencing music development and expertise have been well investigated in the psychological and music literature, the interrogation of possible genetic influences has not progressed at the same rate. Recent advances in genetic research offer fertile ground for exploring the genetic basis of music ability. This paper begins with a brief overview of behavioral and molecular genetic approaches commonly used in human genetic analyses, and then critically reviews the key findings of genetic investigations of the components of music ability. Some promising and converging findings have emerged, with several loci on chromosome 4 implicated in singing and music perception, and certain loci on chromosome 8q implicated in absolute pitch and music perception. The gene AVPR1A on chromosome 12q has also been implicated in music perception, music memory, and music listening, whereas SLC6A4 on chromosome 17q has been associated with music memory and choir participation. Replication of these results in alternate populations and with larger samples is warranted to confirm the findings. Through increased research efforts, a clearer picture of the genetic mechanisms underpinning music ability will hopefully emerge.
Collapse
Affiliation(s)
- Yi Ting Tan
- Melbourne Conservatorium of Music, University of Melbourne Parkville, VIC, Australia
| | - Gary E McPherson
- Melbourne Conservatorium of Music, University of Melbourne Parkville, VIC, Australia
| | - Isabelle Peretz
- International Laboratory for Brain, Music and Sound Research and Department of Psychology, Université de Montréal Montreal, QC, Canada
| | - Samuel F Berkovic
- Department of Medicine, Epilepsy Research Centre, University of Melbourne Heidelberg, VIC, Australia
| | - Sarah J Wilson
- Department of Medicine, Epilepsy Research Centre, University of Melbourne Heidelberg, VIC, Australia ; Melbourne School of Psychological Sciences, University of Melbourne Parkville, VIC, Australia
| |
Collapse
|
128
|
Popitsch N. CODOC: efficient access, analysis and compression of depth of coverage signals. ACTA ACUST UNITED AC 2014; 30:2676-7. [PMID: 24872424 DOI: 10.1093/bioinformatics/btu362] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
UNLABELLED Current data formats for the representation of depth of coverage data (DOC), a central resource for interpreting, filtering or detecting novel features in high-throughput sequencing datasets, were primarily designed for visualization purposes. This limits their applicability in stand-alone analyses of these data, mainly owing to inaccurate representation or mediocre data compression. CODOC is a novel data format and comprehensive application programming interface for efficient representation, access and analysis of DOC data. CODOC compresses these data ∼ 4-32× better than the best current comparable method by exploiting specific data characteristics while at the same time enabling more-exact signal recovery for lossy compression and very fast query answering times. AVAILABILITY AND IMPLEMENTATION Java source code and binaries are freely available for non-commercial use at http://purl.org/bgraph/codoc.
Collapse
Affiliation(s)
- Niko Popitsch
- Center for Integrative Bioinformatics Vienna (CIBIV), Max F Perutz Laboratories, University of Vienna and Medical University of Vienna, Dr. Bohrgasse 9, 1030 Vienna, Austria
| |
Collapse
|
129
|
Castellani CA, Melka MG, Wishart AE, Locke MEO, Awamleh Z, O'Reilly RL, Singh SM. Biological relevance of CNV calling methods using familial relatedness including monozygotic twins. BMC Bioinformatics 2014; 15:114. [PMID: 24750645 PMCID: PMC4021055 DOI: 10.1186/1471-2105-15-114] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 04/14/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Studies involving the analysis of structural variation including Copy Number Variation (CNV) have recently exploded in the literature. Furthermore, CNVs have been associated with a number of complex diseases and neurodevelopmental disorders. Common methods for CNV detection use SNP, CNV, or CGH arrays, where the signal intensities of consecutive probes are used to define the number of copies associated with a given genomic region. These practices pose a number of challenges that interfere with the ability of available methods to accurately call CNVs. It has, therefore, become necessary to develop experimental protocols to test the reliability of CNV calling methods from microarray data so that researchers can properly discriminate biologically relevant data from noise. RESULTS We have developed a workflow for the integration of data from multiple CNV calling algorithms using the same array results. It uses four CNV calling programs: PennCNV (PC), Affymetrix® Genotyping Console™ (AGC), Partek® Genomics Suite™ (PGS) and Golden Helix SVS™ (GH) to analyze CEL files from the Affymetrix® Human SNP 6.0 Array™. To assess the relative suitability of each program, we used individuals of known genetic relationships. We found significant differences in CNV calls obtained by different CNV calling programs. CONCLUSIONS Although the programs showed variable patterns of CNVs in the same individuals, their distribution in individuals of different degrees of genetic relatedness has allowed us to offer two suggestions. The first involves the use of multiple algorithms for the detection of the largest possible number of CNVs, and the second suggests the use of PennCNV over all other methods when the use of only one software program is desirable.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Shiva M Singh
- Department of Biology, The University of Western Ontario, London N6A 5B7, ON, Canada.
| |
Collapse
|
130
|
Vitte C, Fustier MA, Alix K, Tenaillon MI. The bright side of transposons in crop evolution. Brief Funct Genomics 2014; 13:276-95. [PMID: 24681749 DOI: 10.1093/bfgp/elu002] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The past decades have revealed an unexpected yet prominent role of so-called 'junk DNA' in the regulation of gene expression, thereby challenging our view of the mechanisms underlying phenotypic evolution. In particular, several mechanisms through which transposable elements (TEs) participate in functional genome diversity have been depicted, bringing to light the 'TEs bright side'. However, the relative contribution of those mechanisms and, more generally, the importance of TE-based polymorphisms on past and present phenotypic variation in crops species remain poorly understood. Here, we review current knowledge on both issues, and discuss how analyses of massively parallel sequencing data combined with statistical methodologies and functional validations will help unravelling the impact of TEs on crop evolution in a near future.
Collapse
|
131
|
Zhou X, Rokas A. Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. Mol Ecol 2014; 23:1679-700. [DOI: 10.1111/mec.12680] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Revised: 01/17/2014] [Accepted: 01/22/2014] [Indexed: 12/17/2022]
Affiliation(s)
- Xiaofan Zhou
- Department of Biological Sciences; Vanderbilt University; Nashville TN 37235 USA
| | - Antonis Rokas
- Department of Biological Sciences; Vanderbilt University; Nashville TN 37235 USA
| |
Collapse
|
132
|
Mosen-Ansorena D, Telleria N, Veganzones S, De la Orden V, Maestro ML, Aransay AM. seqCNA: an R package for DNA copy number analysis in cancer using high-throughput sequencing. BMC Genomics 2014; 15:178. [PMID: 24597965 PMCID: PMC4022175 DOI: 10.1186/1471-2164-15-178] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 02/26/2014] [Indexed: 11/25/2022] Open
Abstract
Background Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations, are structural rearrangements that can critically affect gene expression patterns. Additionally, copy number alteration profiles allow insight into cancer discrimination, progression and complexity. On data obtained from high-throughput sequencing, improving quality through GC bias correction and keeping false positives to a minimum help build reliable copy number alteration profiles. Results We introduce seqCNA, a parallelized R package for an integral copy number analysis of high-throughput sequencing cancer data. The package includes novel methodology on (i) filtering, reducing false positives, and (ii) GC content correction, improving copy number profile quality, especially under great read coverage and high correlation between GC content and copy number. Adequate analysis steps are automatically chosen based on availability of paired-end mapping, matched normal samples and genome annotation. Conclusions seqCNA, available through Bioconductor, provides accurate copy number predictions in tumoural data, thanks to the extensive filtering and better GC bias correction, while providing an integrated and parallelized workflow. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-178) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- David Mosen-Ansorena
- CIC bioGUNE & CIBERehd, Technologic Park of Bizkaia, Building 502, 48160 Derio, Spain.
| | | | | | | | | | | |
Collapse
|
133
|
Abstract
Common copy number variations (CNVs) are small regions of genomic variations at the same loci across multiple samples, which can be detected with high resolution from next-generation sequencing (NGS) technique. Multiple sequencing data samples are often available from genomic studies; examples include sequences from multiple platforms and sequences from multiple individuals. By integrating complementary information from multiple data samples, detection power can be potentially improved. However, most of current CNV detection methods often process an individual sequence sample, or two samples in an abnormal versus matched normal study; researches on detecting common CNVs across multiple samples have been very limited but are much needed. In this paper, we propose a novel method to detect common CNVs from multiple sequencing samples by exploiting the concurrency of genomic variations in read depth signals derived from multiple NGS data. We use a penalized sparse regression model to fit multiple read depth profiles, based on which common CNV identification is formulated as a change-point detection problem. Finally, we validate the proposed method on both simulation and real data, showing that it can give both higher detection power and better break point estimation over several published CNV detection methods.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Hong-Wen Deng
- Department of Biomedical Engineering and Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA 70118 USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering and Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA 70118 USA
| |
Collapse
|
134
|
Johnson AK, Gaudio DD. Clinical utility of next-generation sequencing for the molecular diagnosis of monogenic diabetes. Per Med 2014; 11:155-165. [PMID: 29751380 DOI: 10.2217/pme.13.111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Monogenic diabetes resulting from mutations that primarily reduce insulin-secreting pancreatic β-cell function accounts for 1-2% of all cases of diabetes, and is genetically and clinically heterogeneous. Currently, genetic testing for monogenic diabetes relies on selection of the appropriate gene for analysis based on the availability of comprehensive phenotypic information, which can be time consuming, costly and can limit the differential diagnosis to a few selected genes. In recent years, the exponential growth in the field of high-throughput capture and sequencing technology has made it possible and cost effective to sequence many genes simultaneously, making it an efficient diagnostic tool for clinically and genetically heterogeneous disorders such as monogenic diabetes. Making a diagnosis of monogenic diabetes is important as it enables more appropriate treatment, better prediction of disease prognosis and progression, and counseling and screening of family members. We provide a concise overview of the genetic etiology of some forms of monogenic diabetes, as well as a discussion of the clinical utility of genetic testing by comprehensive multigene panel using next-generation sequencing methodologies.
Collapse
Affiliation(s)
- Amy Knight Johnson
- Department of Human Genetics, University of Chicago, 5841 S Maryland MC0077, Chicago, IL 60637, USA
| | - Daniela Del Gaudio
- Department of Human Genetics, University of Chicago, 5841 S Maryland MC0077, Chicago, IL 60637, USA
| |
Collapse
|
135
|
Daley M. The complexity of genomic structural variation in neurodevelopmental disorders. Biol Psychiatry 2014; 75:344-5. [PMID: 24507567 DOI: 10.1016/j.biopsych.2013.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 12/12/2013] [Indexed: 11/23/2022]
Affiliation(s)
- Mark Daley
- Departments of Computer Science and Department of Biology, The University of Western Ontario, London, Ontario, Canada.
| |
Collapse
|
136
|
Hocking TD, Boeva V, Rigaill G, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Richer W, Bourdeaut F, Suguro M, Seto M, Bach F, Vert JP. SegAnnDB: interactive Web-based genomic segmentation. ACTA ACUST UNITED AC 2014; 30:1539-46. [PMID: 24493034 PMCID: PMC4029035 DOI: 10.1093/bioinformatics/btu072] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION DNA copy number profiles characterize regions of chromosome gains, losses and breakpoints in tumor genomes. Although many models have been proposed to detect these alterations, it is not clear which model is appropriate before visual inspection the signal, noise and models for a particular profile. RESULTS We propose SegAnnDB, a Web-based computer vision system for genomic segmentation: first, visually inspect the profiles and manually annotate altered regions, then SegAnnDB determines the precise alteration locations using a mathematical model of the data and annotations. SegAnnDB facilitates collaboration between biologists and bioinformaticians, and uses the University of California, Santa Cruz genome browser to visualize copy number alterations alongside known genes. AVAILABILITY AND IMPLEMENTATION The breakpoints project on INRIA GForge hosts the source code, an Amazon Machine Image can be launched and a demonstration Web site is http://bioviz.rocq.inria.fr.
Collapse
Affiliation(s)
- Toby D Hocking
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Valentina Boeva
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, FranceDepartment of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, FranceDepartment of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Guillem Rigaill
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Gudrun Schleiermacher
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, FranceDepartment of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Isabelle Janoueix-Lerosey
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, FranceDepartment of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Olivier Delattre
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, FranceDepartment of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Wilfrid Richer
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Franck Bourdeaut
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Miyuki Suguro
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Masao Seto
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Francis Bach
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| | - Jean-Philippe Vert
- Department of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, FranceDepartment of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, FranceDepartment of Computer Science, Tokyo Institute of Technology, Tokyo 152-8552, Japan, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Unité de Recherche en Génomique Végétale INRA-CNRS-Université d'Evry Val d'Essonne, Évry 91057, France, INSERM U830, Paris F-75248, France, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya-city 464-8681, Japan and INRIA-Sierra Project-Team, Département d'Informatique de l'École Normale Supérieure, Paris F-75013, France
| |
Collapse
|
137
|
Ping Z, Siegal GP, Almeida JS, Schnitt SJ, Shen D. Mining genome sequencing data to identify the genomic features linked to breast cancer histopathology. J Pathol Inform 2014; 5:3. [PMID: 24672738 PMCID: PMC3952399 DOI: 10.4103/2153-3539.126147] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 12/09/2013] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Genetics and genomics have radically altered our understanding of breast cancer progression. However, the genomic basis of various histopathologic features of breast cancer is not yet well-defined. MATERIALS AND METHODS The Cancer Genome Atlas (TCGA) is an international database containing a large collection of human cancer genome sequencing data. cBioPortal is a web tool developed for mining these sequencing data. We performed mining of TCGA sequencing data in an attempt to characterize the genomic features correlated with breast cancer histopathology. We first assessed the quality of the TCGA data using a group of genes with known alterations in various cancers. Both genome-wide gene mutation and copy number changes as well as a group of genes with a high frequency of genetic changes were then correlated with various histopathologic features of invasive breast cancer. RESULTS Validation of TCGA data using a group of genes with known alterations in breast cancer suggests that the TCGA has accurately documented the genomic abnormalities of multiple malignancies. Further analysis of TCGA breast cancer sequencing data shows that accumulation of specific genomic defects is associated with higher tumor grade, larger tumor size and receptor negativity. Distinct groups of genomic changes were found to be associated with the different grades of invasive ductal carcinoma. The mutator role of the TP53 gene was validated by genomic sequencing data of invasive breast cancer and TP53 mutation was found to play a critical role in defining high tumor grade. CONCLUSIONS Data mining of the TCGA genome sequencing data is an innovative and reliable method to help characterize the genomic abnormalities associated with histopathologic features of invasive breast cancer.
Collapse
Affiliation(s)
- Zheng Ping
- Department of Pathology, Division of Anatomic Pathology, Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Gene P Siegal
- Department of Pathology, Division of Anatomic Pathology, Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Jonas S Almeida
- Department of Pathology, Division of Informatics, Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Stuart J Schnitt
- Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Dejun Shen
- Department of Pathology, Division of Anatomic Pathology, Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, Alabama, USA
| |
Collapse
|
138
|
Vardhanabhuti S, Jeng XJ, Wu Y, Li H. Parametric modeling of whole-genome sequencing data for CNV identification. Biostatistics 2014; 15:427-41. [PMID: 24478395 DOI: 10.1093/biostatistics/kxt060] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Copy number variants (CNVs) constitute an important class of genetic variants in human genome and are shown to be associated with complex diseases. Whole-genome sequencing provides an unbiased way of identifying all the CNVs that an individual carries. In this paper, we consider parametric modeling of the read depth (RD) data from whole-genome sequencing with the aim of identifying the CNVs, including both Poisson and negative-binomial modeling of such count data. We propose a unified approach of using a mean-matching variance stabilizing transformation to turn the relatively complicated problem of sparse segment identification for count data into a sparse segment identification problem for a sequence of Gaussian data. We apply the optimal sparse segment identification procedure to the transformed data in order to identify the CNV segments. This provides a computationally efficient approach for RD-based CNV identification. Simulation results show that this approach often results in a small number of false identifications of the CNVs and has similar or better performances in identifying the true CNVs when compared with other RD-based approaches. We demonstrate the methods using the trio data from the 1000 Genomes Project.
Collapse
Affiliation(s)
- Saran Vardhanabhuti
- Harvard School of Public Health, 651 Huntington Avenue, Boston, MA 02115, USA
| | - X Jessie Jeng
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Yinghua Wu
- Division of Biostatistics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hongzhe Li
- Division of Biostatistics, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
139
|
Duitama J, Quintero JC, Cruz DF, Quintero C, Hubmann G, Foulquié-Moreno MR, Verstrepen KJ, Thevelein JM, Tohme J. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments. Nucleic Acids Res 2014; 42:e44. [PMID: 24413664 PMCID: PMC3973327 DOI: 10.1093/nar/gkt1381] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.
Collapse
Affiliation(s)
- Jorge Duitama
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
- *To whom correspondence should be addressed. Tel: +57 2 4450000; Fax: +57 2 4450073;
| | - Juan Camilo Quintero
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Daniel Felipe Cruz
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Constanza Quintero
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Georg Hubmann
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Maria R. Foulquié-Moreno
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Kevin J. Verstrepen
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Johan M. Thevelein
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Joe Tohme
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| |
Collapse
|
140
|
A high-density gene map of loblolly pine (Pinus taeda L.) based on exome sequence capture genotyping. G3-GENES GENOMES GENETICS 2014; 4:29-37. [PMID: 24192835 PMCID: PMC3887537 DOI: 10.1534/g3.113.008714] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Loblolly pine (Pinus taeda L.) is an economically and ecologically important conifer for which a suite of genomic resources is being generated. Despite recent attempts to sequence the large genome of conifers, their assembly and the positioning of genes remains largely incomplete. The interspecific synteny in pines suggests that a gene-based map would be useful to support genome assemblies and analysis of conifers. To establish a reference gene-based genetic map, we performed exome sequencing of 14729 genes on a mapping population of 72 haploid samples, generating a resource of 7434 sequence variants segregating for 3787 genes. Most markers are single-nucleotide polymorphisms, although short insertions/deletions and multiple nucleotide polymorphisms also were used. Marker segregation in the population was used to generate a high-density, gene-based genetic map. A total of 2841 genes were mapped to pine’s 12 linkage groups with an average of one marker every 0.58 cM. Capture data were used to detect gene presence/absence variations and position 65 genes on the map. We compared the marker order of genes previously mapped in loblolly pine and found high agreement. We estimated that 4123 genes had enough sequencing depth for reliable detection of markers, suggesting a high marker conversation rate of 92% (3787/4123). This is possible because a significant portion of the gene is captured and sequenced, increasing the chances of identifying a polymorphic site for characterization and mapping. This sub-centiMorgan genetic map provides a valuable resource for gene positioning on chromosomes and guide for the assembly of a reference pine genome.
Collapse
|
141
|
Abstract
Cancer is a complex disease driven by multiple mutations acquired over the lifetime of the cancer cells. These alterations, termed somatic mutations to distinguish them from inherited germline mutations, can include single-nucleotide substitutions, insertions, deletions, copy number alterations, and structural rearrangements. A patient's cancer can contain a combination of these aberrations, and the ability to generate a comprehensive genetic profile should greatly improve patient diagnosis and treatment. Next-generation sequencing has become the tool of choice to uncover multiple cancer mutations from a single tumor source, and the falling costs of this rapid high-throughput technology are encouraging its transition from basic research into a clinical setting. However, the detection of mutations in sequencing data is still an evolving area and cancer genomic data requires some special considerations. This chapter discusses these aspects and gives an overview of current bioinformatics methods for the detection of somatic mutations in cancer sequencing data.
Collapse
|
142
|
Proliferation and copy number variation of BEL-like long terminal repeat retrotransposons within the Diabrotica virgifera virgifera genome. Gene 2014. [DOI: 10.1016/j.gene.2013.09.100] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
143
|
Moelans CB, Holst F, Hellwinkel O, Simon R, van Diest PJ. ESR1 amplification in breast cancer by optimized RNase FISH: frequent but low-level and heterogeneous. PLoS One 2013; 8:e84189. [PMID: 24367641 PMCID: PMC3867473 DOI: 10.1371/journal.pone.0084189] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/13/2013] [Indexed: 01/09/2023] Open
Abstract
Prevalence of ESR1 amplification in breast cancer is highly disputed and discrepancies have been related to different technical protocols and different scoring approaches. In addition, pre-mRNA artifacts have been proposed to influence outcome of ESR1 FISH analysis. We analyzed ESR1 gene copy number status combining an improved RNase FISH protocol with multiplex ligation-dependent probe amplification (MLPA) after laser microdissection. FISH showed a high prevalence of ESR1 gains and amplifications despite RNase treatment but MLPA did not confirm ESR1 copy number increases detected by FISH in more than half of cases. We suggest that the combination of the ESR1-specific intra-tumor heterogeneity and low-level copy number increase accounts for these discrepancies.
Collapse
Affiliation(s)
- Cathy B. Moelans
- Department of Pathology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Frederik Holst
- Section of Gynecology and Obstetrics, Department of Clinical Science, Haukeland University Hospital, Bergen, Norway
- Department of Pathology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Olaf Hellwinkel
- Department of Legal Medicine, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Ronald Simon
- Department of Pathology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Paul J. van Diest
- Department of Pathology, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
144
|
Valdés A, Ibáñez C, Simó C, García-Cañas V. Recent transcriptomics advances and emerging applications in food science. Trends Analyt Chem 2013. [DOI: 10.1016/j.trac.2013.06.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
145
|
SomatiCA: identifying, characterizing and quantifying somatic copy number aberrations from cancer genome sequencing data. PLoS One 2013; 8:e78143. [PMID: 24265680 PMCID: PMC3827077 DOI: 10.1371/journal.pone.0078143] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 09/07/2013] [Indexed: 11/19/2022] Open
Abstract
Whole genome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. However, analysis of somatic copy-number changes from sequencing data is still challenging because of insufficient sequencing coverage, unknown tumor sample purity and subclonal heterogeneity. Here we describe a computational framework, named SomatiCA, which explicitly accounts for tumor purity and subclonality in the analysis of somatic copy-number profiles. Taking read depths (RD) and lesser allele frequencies (LAF) as input, SomatiCA will output 1) admixture rate for each tumor sample, 2) somatic allelic copy-number for each genomic segment, 3) fraction of tumor cells with subclonal change in each somatic copy number aberration (SCNA), and 4) a list of substantial genomic aberration events including gain, loss and LOH. SomatiCA is available as a Bioconductor R package at http://www.bioconductor.org/packages/2.13/bioc/html/SomatiCA.html.
Collapse
|
146
|
Huse JT, Aldape KD. The molecular landscape of diffuse glioma and prospects for biomarker development. ACTA ACUST UNITED AC 2013; 7:573-87. [PMID: 24161073 DOI: 10.1517/17530059.2013.846321] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
INTRODUCTION High-throughput molecular profiling is transforming long-standing conceptions of diffuse gliomas, the most common primary brain tumors. Indeed, comprehensive genomic, transcriptomic and epigenomic analyses have not only provided striking mechanistic insights into the pathogenesis of diffuse gliomas but also greatly enriched the pool of potential biomarkers for prognostic and predictive patient stratification. AREAS COVERED This article summarizes significant recent developments in the molecular characterization of diffuse gliomas, focusing on implications for biomarker development and application. In doing so, we will also address relevant high-throughput molecular profiling technologies and both the opportunities and challenges implicit in their widespread incorporation into disease management workflows. EXPERT OPINION Although the number of validated biomarkers guiding diffuse glioma management is currently quite small, rapidly progressing molecular annotation continues to provide a steady stream of clinically relevant candidates, many of which show promise for predictive capabilities in the context of specific targeted therapeutics. Such potential now requires rigorous validation in well-designed clinical trials supported by robust molecular profiling assays operative from standard clinical material.
Collapse
Affiliation(s)
- Jason T Huse
- Memorial Sloan-Kettering Cancer Center, Department of Pathology and Human Oncology and Pathogenesis Program , 1275 York Avenue, NY 10065 , USA
| | | |
Collapse
|
147
|
Dorn C, Grunert M, Sperling SR. Application of high-throughput sequencing for studying genomic variations in congenital heart disease. Brief Funct Genomics 2013; 13:51-65. [PMID: 24095982 DOI: 10.1093/bfgp/elt040] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Congenital heart diseases (CHD) represent the most common birth defect in human. The majority of cases are caused by a combination of complex genetic alterations and environmental influences. In the past, many disease-causing mutations have been identified; however, there is still a large proportion of cardiac malformations with unknown precise origin. High-throughput sequencing technologies established during the last years offer novel opportunities to further study the genetic background underlying the disease. In this review, we provide a roadmap for designing and analyzing high-throughput sequencing studies focused on CHD, but also with general applicability to other complex diseases. The three main next-generation sequencing (NGS) platforms including their particular advantages and disadvantages are presented. To identify potentially disease-related genomic variations and genes, different filtering steps and gene prioritization strategies are discussed. In addition, available control datasets based on NGS are summarized. Finally, we provide an overview of current studies already using NGS technologies and showing that these techniques will help to further unravel the complex genetics underlying CHD.
Collapse
Affiliation(s)
- Cornelia Dorn
- Department of Cardiovascular Genetics, Experimental and Clinical Research Center (ECRC), Charité-University Medicine Berlin and Max Delbrück Center (MDC) for Molecular Medicine, Lindenberger Weg 80, 13125 Berlin, Germany. Department of Biochemistry, Free University Berlin, Berlin, Germany. Tel.: +49-(0)30-450540123; Fax: +49-(0)30-84131699;
| | | | | |
Collapse
|
148
|
Abstract
MOTIVATION Data quality is a critical issue in the analyses of DNA copy number alterations obtained from microarrays. It is commonly assumed that copy number alteration data can be modeled as piecewise constant and the measurement errors of different probes are independent. However, these assumptions do not always hold in practice. In some published datasets, we find that measurement errors are highly correlated between probes that interrogate nearby genomic loci, and the piecewise-constant model does not fit the data well. The correlated errors cause problems in downstream analysis, leading to a large number of DNA segments falsely identified as having copy number gains and losses. METHOD We developed a simple tool, called autocorrelation scanning profile, to assess the dependence of measurement error between neighboring probes. RESULTS Autocorrelation scanning profile can be used to check data quality and refine the analysis of DNA copy number data, which we demonstrate in some typical datasets. CONTACT lzhangli@mdanderson.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liangcai Zhang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA and Department of Biophysics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | | |
Collapse
|
149
|
Zhao M, Wang Q, Wang Q, Jia P, Zhao Z. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics 2013; 14 Suppl 11:S1. [PMID: 24564169 PMCID: PMC3846878 DOI: 10.1186/1471-2105-14-s11-s1] [Citation(s) in RCA: 350] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.
Collapse
|
150
|
de Ligt J, Boone PM, Pfundt R, Vissers LELM, Richmond T, Geoghegan J, O'Moore K, de Leeuw N, Shaw C, Brunner HG, Lupski JR, Veltman JA, Hehir-Kwa JY. Detection of clinically relevant copy number variants with whole-exome sequencing. Hum Mutat 2013; 34:1439-48. [PMID: 23893877 DOI: 10.1002/humu.22387] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Accepted: 07/17/2013] [Indexed: 12/22/2022]
Abstract
Copy number variation (CNV) is a common source of genetic variation that has been implicated in many genomic disorders. This has resulted in the widespread application of genomic microarrays as a first-tier diagnostic tool for CNV detection. More recently, whole-exome sequencing (WES) has been proven successful for the detection of clinically relevant point mutations and small insertion-deletions exome wide. We evaluate the utility of short-read WES (SOLiD 5500xl) to detect clinically relevant CNVs in DNA from 10 patients with intellectual disability and compare these results to data from two independent high-resolution microarrays. Eleven of the 12 clinically relevant CNVs were detected via read-depth analysis of WES data; a heterozygous single-exon deletion remained undetected by all algorithms evaluated. Although the detection power of WES for small CNVs currently does not match that of high-resolution microarray platforms, we show that the majority (88%) of rare coding CNVs containing three or more exons are successfully identified by WES. These results show that the CNV detection resolution of WES is comparable to that of medium-resolution genomic microarrays commonly used as clinical assays. The combined detection of point mutations, indels, and CNVs makes WES a very attractive first-tier diagnostic test for genetically heterogeneous disorders.
Collapse
Affiliation(s)
- Joep de Ligt
- Department of Human Genetics, Nijmegen Centre for Molecular Life Sciences, Institute for Genetic and Metabolic Disease, Radboud University Medical Centre, Nijmegen, 6500 HB, The Netherlands
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|