1
|
Silaiyiman S, Liu J, Wu J, Ouyang L, Cao Z, Shen C. A Systematic Review of the Advances and New Insights into Copy Number Variations in Plant Genomes. PLANTS (BASEL, SWITZERLAND) 2025; 14:1399. [PMID: 40364428 PMCID: PMC12073271 DOI: 10.3390/plants14091399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2025] [Revised: 04/27/2025] [Accepted: 05/05/2025] [Indexed: 05/15/2025]
Abstract
Copy number variations (CNVs), as an important structural variant in genomes, are widely present in plants, affecting their phenotype and adaptability. In recent years, CNV research has not only focused on changes in gene copy numbers but has also been linked to complex mechanisms such as genome rearrangements, transposon activity, and environmental adaptation. The advancement in sequencing technologies has made the detection and analysis of CNVs more efficient, not only revealing their crucial roles in plant disease resistance, adaptability, and growth development, but also demonstrating broad application potential in crop improvement, particularly in selective breeding and genomic selection. By studying CNV changes during the domestication process, researchers have gradually recognized the important role of CNVs in plant domestication and evolution. This article reviews the formation mechanisms of CNVs in plants, methods for their detection, their relationship with plant traits, and their applications in crop improvement. It emphasizes future research directions involving the integration of multi-omics to provide new perspectives on the structure and function of plant genomes.
Collapse
Affiliation(s)
- Saimire Silaiyiman
- Guangdong Provincial Key Laboratory for Green Agricultural Production and Intelligent Equipment, College of Biological and Food Engineering, Guangdong University of Petrochemical Technology, Maoming 525000, China; (S.S.); (J.L.); (J.W.); (L.O.)
- College of Life and Geographic Sciences, Kashi University, Kashi 844000, China
- Key Laboratory of Biological Resources and Ecology of Pamirs Plateau in Xinjiang Uygur Autonomous Region, Kashi 844000, China
| | - Jiaxuan Liu
- Guangdong Provincial Key Laboratory for Green Agricultural Production and Intelligent Equipment, College of Biological and Food Engineering, Guangdong University of Petrochemical Technology, Maoming 525000, China; (S.S.); (J.L.); (J.W.); (L.O.)
- College of Life and Geographic Sciences, Kashi University, Kashi 844000, China
- Key Laboratory of Biological Resources and Ecology of Pamirs Plateau in Xinjiang Uygur Autonomous Region, Kashi 844000, China
| | - Jiaxin Wu
- Guangdong Provincial Key Laboratory for Green Agricultural Production and Intelligent Equipment, College of Biological and Food Engineering, Guangdong University of Petrochemical Technology, Maoming 525000, China; (S.S.); (J.L.); (J.W.); (L.O.)
- College of Life and Geographic Sciences, Kashi University, Kashi 844000, China
- Key Laboratory of Biological Resources and Ecology of Pamirs Plateau in Xinjiang Uygur Autonomous Region, Kashi 844000, China
| | - Lejun Ouyang
- Guangdong Provincial Key Laboratory for Green Agricultural Production and Intelligent Equipment, College of Biological and Food Engineering, Guangdong University of Petrochemical Technology, Maoming 525000, China; (S.S.); (J.L.); (J.W.); (L.O.)
- College of Life and Geographic Sciences, Kashi University, Kashi 844000, China
| | - Zheng Cao
- Maoming Agricultural Science and Technology Extension Center, Maoming 525000, China;
| | - Chao Shen
- Guangdong Provincial Key Laboratory for Green Agricultural Production and Intelligent Equipment, College of Biological and Food Engineering, Guangdong University of Petrochemical Technology, Maoming 525000, China; (S.S.); (J.L.); (J.W.); (L.O.)
| |
Collapse
|
2
|
Lesack K, Mariene GM, Andersen EC, Wasmuth JD. Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans. PLoS One 2022; 17:e0278424. [PMID: 36584177 PMCID: PMC9803319 DOI: 10.1371/journal.pone.0278424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/15/2022] [Indexed: 01/01/2023] Open
Abstract
The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmarked using data from the human genome. To evaluate the use of long-read data for the validation of short-read structural variant calls, the agreement between predictions from a short-read ensemble learning method and long-read tools were compared using real and simulated data from Caenorhabditis elegans. The results obtained from simulated data indicate that the best performing tool is contingent on the type and size of the variant, as well as the sequencing depth of coverage. These results also highlight the need for reference datasets generated from real data that can be used as 'ground truth' in benchmarks.
Collapse
Affiliation(s)
- Kyle Lesack
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
| | - Grace M. Mariene
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
| | - Erik C. Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, United States of America
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Alberta, Canada
- * E-mail:
| |
Collapse
|
3
|
Wendt FR, Pathak GA, Polimanti R. Phenome-wide association study of loci harboring de novo tandem repeat mutations in UK Biobank exomes. Nat Commun 2022; 13:7682. [PMID: 36509785 PMCID: PMC9744822 DOI: 10.1038/s41467-022-35423-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 12/02/2022] [Indexed: 12/15/2022] Open
Abstract
When present in coding regions, tandem repeats (TRs) may have large effects on protein structure and function contributing to health and disease. We use a family-based design to identify de novo TRs and assess their impact at the population level in 148,607 European ancestry participants from the UK Biobank. The 427 loci with de novo TR mutations are enriched for targets of microRNA-184 (21.1-fold, P = 4.30 × 10-5, FDR = 9.50 × 10-3). There are 123 TR-phenotype associations with posterior probabilities > 0.95. These relate to body structure, cognition, and cardiovascular, metabolic, psychiatric, and respiratory outcomes. We report several loci with large likely causal effects on tissue microstructure, including the FAN1-[TG]N and carotid intima-media thickness (mean thickness: beta = 5.22, P = 1.22 × 10-6, FDR = 0.004; maximum thickness: beta = 6.44, P = 1.12 × 10-6, FDR = 0.004). Two exonic repeats FNBP4-[GGT]N and BTN2A1-[CCT]N alter protein structure. In this work, we contribute clear and testable hypotheses of dose-dependent TR implications linking genetic variation and protein structure with health and disease outcomes.
Collapse
Affiliation(s)
- Frank R Wendt
- Department of Anthropology, University of Toronto, Mississauga, ON, Canada.
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
- Forensic Science Program, University of Toronto, Mississauga, ON, Canada.
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.
- VA CT Healthcare System, West Haven, CT, USA.
| | - Gita A Pathak
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- VA CT Healthcare System, West Haven, CT, USA
| | - Renato Polimanti
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- VA CT Healthcare System, West Haven, CT, USA
| |
Collapse
|
4
|
Yuan Y, Bayer PE, Batley J, Edwards D. Current status of structural variation studies in plants. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:2153-2163. [PMID: 34101329 PMCID: PMC8541774 DOI: 10.1111/pbi.13646] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 05/23/2023]
Abstract
Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
- School of Life Sciences and State Key Laboratory for AgrobiotechnologyThe Chinese University of Hong KongHong Kong SARChina
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| |
Collapse
|
5
|
Statistical Considerations on NGS Data for Inferring Copy Number Variations. Methods Mol Biol 2021; 2243:27-58. [PMID: 33606251 DOI: 10.1007/978-1-0716-1103-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The next-generation sequencing (NGS) technology has revolutionized research in genetics and genomics, resulting in massive NGS data and opening more fronts to answer unresolved issues in genetics. NGS data are usually stored at three levels: image files, sequence tags, and alignment reads. The sizes of these types of data usually range from several hundreds of gigabytes to several terabytes. Biostatisticians and bioinformaticians are typically working with the aligned NGS read count data (hence the last level of NGS data) for data modeling and interpretation.To horn in on the use of NGS technology, researchers utilize it to profile the whole genome to study DNA copy number variations (CNVs) for an individual subject (or patient) as well as groups of subjects (or patients). The resulting aligned NGS read count data are then modeled by proper mathematical and statistical approaches so that the loci of CNVs can be accurately detected. In this book chapter, a summary of most popularly used statistical methods for detecting CNVs using NGS data is given. The goal is to provide readers with a comprehensive resource of available statistical approaches for inferring DNA copy number variations using NGS data.
Collapse
|
6
|
Luo J, Wei C, Liu H, Cheng S, Xiao Y, Wang X, Yan J, Liu J. MaizeCUBIC: a comprehensive variation database for a maize synthetic population. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5857845. [PMID: 32548639 PMCID: PMC7297647 DOI: 10.1093/database/baaa044] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 04/03/2020] [Accepted: 05/18/2020] [Indexed: 11/13/2022]
Abstract
MaizeCUBIC is a free database that describes genomic variations, gene expression, phenotypes and quantitative trait locus (QTLs) for a maize CUBIC population (24 founders and 1404 inbred offspring). The database not only includes information for over 14M single nucleotide polymorphism (SNPs) and 43K indels previously identified but also contains 660K structure variations (SVs) and 600M novel sequences newly identified in the present study, which represents a comprehensive high-density variant map for a diverse population. Based on these genomic variations, the database would demonstrate the mosaic structure for each progeny, reflecting a high-resolution reshuffle across parental genomes. A total of 23 agronomic traits measured on parents and progeny in five locations, where are representative of the maize main growing regions in China, were also included in the database. To further explore the genotype–phenotype relationships, two different methods of genome-wide association studies (GWAS) were employed for dissecting the genetic architecture of 23 agronomic traits. Additionally, the Basic Local Alignment Search Tool and primer design tools are developed to promote follow-up analysis and experimental verification. All the original data and corresponding analytical results can be accessed through user-friendly online queries and web interface dynamic visualization, as well as downloadable files. These data and tools provide valuable resources on genetic and genomic studies of maize and other crops.
Collapse
Affiliation(s)
- Jingyun Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Chengcheng Wei
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Haijun Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.,Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna 1030, Austria
| | - Shikun Cheng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiaqing Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
7
|
Jin Y, Chen G, Xiao W, Hong H, Xu J, Guo Y, Xiao W, Shi T, Shi L, Tong W, Ning B. Sequencing XMET genes to promote genotype-guided risk assessment and precision medicine. SCIENCE CHINA-LIFE SCIENCES 2019; 62:895-904. [PMID: 31114935 DOI: 10.1007/s11427-018-9479-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 12/06/2018] [Indexed: 12/26/2022]
Abstract
High-throughput next generation sequencing (NGS) is a shotgun approach applied in a parallel fashion by which the genome is fragmented and sequenced through small pieces and then analyzed either by aligning to a known reference genome or by de novo assembly without reference genome. This technology has led researchers to conduct an explosion of sequencing related projects in multidisciplinary fields of science. However, due to the limitations of sequencing-based chemistry, length of sequencing reads and the complexity of genes, it is difficult to determine the sequences of some portions of the human genome, leaving gaps in genomic data that frustrate further analysis. Particularly, some complex genes are difficult to be accurately sequenced or mapped because they contain high GC-content and/or low complexity regions, and complicated pseudogenes, such as the genes encoding xenobiotic metabolizing enzymes and transporters (XMETs). The genetic variants in XMET genes are critical to predicate inter-individual variability in drug efficacy, drug safety and susceptibility to environmental toxicity. We summarized and discussed challenges, wet-lab methods, and bioinformatics algorithms in sequencing "complex" XMET genes, which may provide insightful information in the application of NGS technology for implementation in toxicogenomics and pharmacogenomics.
Collapse
Affiliation(s)
- Yaqiong Jin
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Geng Chen
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Wenming Xiao
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Yongli Guo
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Wenzhong Xiao
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Cancer Center; Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, 200433, China
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Baitang Ning
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
8
|
Mohammed Ismail W, Pagel KA, Pejaver V, Zhang SV, Casasa S, Mort M, Cooper DN, Hahn MW, Radivojac P. The sequencing and interpretation of the genome obtained from a Serbian individual. PLoS One 2018; 13:e0208901. [PMID: 30566479 PMCID: PMC6300249 DOI: 10.1371/journal.pone.0208901] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 11/26/2018] [Indexed: 02/07/2023] Open
Abstract
Recent genetic studies and whole-genome sequencing projects have greatly improved our understanding of human variation and clinically actionable genetic information. Smaller ethnic populations, however, remain underrepresented in both individual and large-scale sequencing efforts and hence present an opportunity to discover new variants of biomedical and demographic significance. This report describes the sequencing and analysis of a genome obtained from an individual of Serbian origin, introducing tens of thousands of previously unknown variants to the currently available pool. Ancestry analysis places this individual in close proximity to Central and Eastern European populations; i.e., closest to Croatian, Bulgarian and Hungarian individuals and, in terms of other Europeans, furthest from Ashkenazi Jewish, Spanish, Sicilian and Baltic individuals. Our analysis confirmed gene flow between Neanderthal and ancestral pan-European populations, with similar contributions to the Serbian genome as those observed in other European groups. Finally, to assess the burden of potentially disease-causing/clinically relevant variation in the sequenced genome, we utilized manually curated genotype-phenotype association databases and variant-effect predictors. We identified several variants that have previously been associated with severe early-onset disease that is not evident in the proband, as well as putatively impactful variants that could yet prove to be clinically relevant to the proband over the next decades. The presence of numerous private and low-frequency variants, along with the observed and predicted disease-causing mutations in this genome, exemplify some of the global challenges of genome interpretation, especially in the context of under-studied ethnic groups.
Collapse
Affiliation(s)
- Wazim Mohammed Ismail
- Department of Computer Science, Indiana University, Bloomington, Indiana, United States of America
| | - Kymberleigh A. Pagel
- Department of Computer Science, Indiana University, Bloomington, Indiana, United States of America
| | - Vikas Pejaver
- Department of Computer Science, Indiana University, Bloomington, Indiana, United States of America
| | - Simo V. Zhang
- Department of Computer Science, Indiana University, Bloomington, Indiana, United States of America
| | - Sofia Casasa
- Department of Biology, Indiana University, Bloomington, Indiana, United States of America
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - Matthew W. Hahn
- Department of Computer Science, Indiana University, Bloomington, Indiana, United States of America
- Department of Biology, Indiana University, Bloomington, Indiana, United States of America
| | - Predrag Radivojac
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
9
|
Monlong J, Cossette P, Meloche C, Rouleau G, Girard SL, Bourque G. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res 2018; 46:7236-7249. [PMID: 30137632 PMCID: PMC6101599 DOI: 10.1093/nar/gky538] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 05/04/2018] [Accepted: 06/12/2018] [Indexed: 12/18/2022] Open
Abstract
Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately 5 times more likely to harbor germline CNVs, in stark contrast to the nearly uniform distribution observed for somatic CNVs in 95 cancer genomes. In addition to known enrichments in segmental duplication and near centromeres and telomeres, we also report that CNVs are enriched in specific types of satellite and in some of the most recent families of transposable elements. Finally, using this comprehensive approach, we identify 3455 regions with recurrent CNVs that were missing from existing catalogs. In particular, we identify 347 genes with a novel exonic CNV in low-mappability regions, including 29 genes previously associated with disease.
Collapse
Affiliation(s)
- Jean Monlong
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada
| | - Patrick Cossette
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
| | - Caroline Meloche
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
| | - Guy Rouleau
- Montreal Neurological Institute, McGill University, Montréal H3A 2B4, Canada
| | - Simon L Girard
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
- Département des sciences fondamentales, Université du Québec à Chicoutimi, Chicoutimi G7H 2B1, Canada
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada
- McGill University and Génome Québec Innovation Center, Montréal H3A 1A4, Canada
| |
Collapse
|
10
|
Menghi F, Barthel FP, Yadav V, Tang M, Ji B, Tang Z, Carter GW, Ruan Y, Scully R, Verhaak RGW, Jonkers J, Liu ET. The Tandem Duplicator Phenotype Is a Prevalent Genome-Wide Cancer Configuration Driven by Distinct Gene Mutations. Cancer Cell 2018; 34:197-210.e5. [PMID: 30017478 PMCID: PMC6481635 DOI: 10.1016/j.ccell.2018.06.008] [Citation(s) in RCA: 117] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 05/04/2018] [Accepted: 06/14/2018] [Indexed: 12/14/2022]
Abstract
The tandem duplicator phenotype (TDP) is a genome-wide instability configuration primarily observed in breast, ovarian, and endometrial carcinomas. Here, we stratify TDP tumors by classifying their tandem duplications (TDs) into three span intervals, with modal values of 11 kb, 231 kb, and 1.7 Mb, respectively. TDPs with ∼11 kb TDs feature loss of TP53 and BRCA1. TDPs with ∼231 kb and ∼1.7 Mb TDs associate with CCNE1 pathway activation and CDK12 disruptions, respectively. We demonstrate that p53 and BRCA1 conjoint abrogation drives TDP induction by generating short-span TDP mammary tumors in genetically modified mice lacking them. Lastly, we show how TDs in TDP tumors disrupt heterogeneous combinations of tumor suppressors and chromatin topologically associating domains while duplicating oncogenes and super-enhancers.
Collapse
Affiliation(s)
- Francesca Menghi
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA
| | - Floris P Barthel
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA
| | - Vinod Yadav
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA
| | - Ming Tang
- MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Bo Ji
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Zhonghui Tang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA
| | | | - Yijun Ruan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA
| | - Ralph Scully
- Division of Hematology Oncology, Department of Medicine, and Cancer Research Institute, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA 02215, USA
| | - Roel G W Verhaak
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA
| | - Jos Jonkers
- Oncode Institute and Division of Molecular Pathology, The Netherlands Cancer Institute, Amsterdam 1066CX, the Netherlands
| | - Edison T Liu
- The Jackson Laboratory, Bar Harbor, ME 04609, USA.
| |
Collapse
|
11
|
Becker T, Lee WP, Leone J, Zhu Q, Zhang C, Liu S, Sargent J, Shanker K, Mil-Homens A, Cerveira E, Ryan M, Cha J, Navarro FCP, Galeev T, Gerstein M, Mills RE, Shin DG, Lee C, Malhotra A. FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods. Genome Biol 2018; 19:38. [PMID: 29559002 PMCID: PMC5859555 DOI: 10.1186/s13059-018-1404-6] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 02/07/2018] [Indexed: 11/16/2022] Open
Abstract
Comprehensive and accurate identification of structural variations (SVs) from next generation sequencing data remains a major challenge. We develop FusorSV, which uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. It includes a fusion model built using analysis of 27 deep-coverage human genomes from the 1000 Genomes Project. We identify 843 novel SV calls that were not reported by the 1000 Genomes Project for these 27 samples. Experimental validation of a subset of these calls yields a validation rate of 86.7%. FusorSV is available at https://github.com/TheJacksonLaboratory/SVE.
Collapse
Affiliation(s)
- Timothy Becker
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.,Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Wan-Ping Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Joseph Leone
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Chengsheng Zhang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Silvia Liu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jack Sargent
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Kritika Shanker
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Adam Mil-Homens
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Eliza Cerveira
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Mallory Ryan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jane Cha
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Fabio C P Navarro
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.,Department of Computer Science, Yale University, New Haven, CT, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Dong-Guk Shin
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA. .,The Department of Life Sciences, Ewha Womans University, Seoul, Korea.
| | - Ankit Malhotra
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
12
|
GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res 2017; 27:2050-2060. [PMID: 29097403 PMCID: PMC5741059 DOI: 10.1101/gr.222109.117] [Citation(s) in RCA: 242] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 09/14/2017] [Indexed: 01/08/2023]
Abstract
The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis.
Collapse
|
13
|
Ji T, Chen J. Statistical models for DNA copy number variation detection using read-depth data from next generation sequencing experiments. AUST NZ J STAT 2016. [DOI: 10.1111/anzs.12175] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Tieming Ji
- Department of Statistics; University of Missouri at Columbia; Columbia MI 65211 USA
| | - Jie Chen
- Department of Biostatistics and Epidemiology; Medical College of Georgia, Augusta University; Augusta GA 30912 USA
| |
Collapse
|
14
|
The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc Natl Acad Sci U S A 2016; 113:E2373-82. [PMID: 27071093 DOI: 10.1073/pnas.1520010113] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Next-generation sequencing studies have revealed genome-wide structural variation patterns in cancer, such as chromothripsis and chromoplexy, that do not engage a single discernable driver mutation, and whose clinical relevance is unclear. We devised a robust genomic metric able to identify cancers with a chromotype called tandem duplicator phenotype (TDP) characterized by frequent and distributed tandem duplications (TDs). Enriched only in triple-negative breast cancer (TNBC) and in ovarian, endometrial, and liver cancers, TDP tumors conjointly exhibit tumor protein p53 (TP53) mutations, disruption of breast cancer 1 (BRCA1), and increased expression of DNA replication genes pointing at rereplication in a defective checkpoint environment as a plausible causal mechanism. The resultant TDs in TDP augment global oncogene expression and disrupt tumor suppressor genes. Importantly, the TDP strongly correlates with cisplatin sensitivity in both TNBC cell lines and primary patient-derived xenografts. We conclude that the TDP is a common cancer chromotype that coordinately alters oncogene/tumor suppressor expression with potential as a marker for chemotherapeutic response.
Collapse
|
15
|
Guan P, Sung WK. Structural variation detection using next-generation sequencing data: A comparative technical review. Methods 2016; 102:36-49. [PMID: 26845461 DOI: 10.1016/j.ymeth.2016.01.020] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2015] [Revised: 01/09/2016] [Accepted: 01/31/2016] [Indexed: 12/11/2022] Open
Abstract
Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies.
Collapse
Affiliation(s)
- Peiyong Guan
- School of Computing, National University of Singapore, 117543, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 117543, Singapore; Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore.
| |
Collapse
|
16
|
Pirooznia M, Goes FS, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet 2015; 6:138. [PMID: 25918519 PMCID: PMC4394692 DOI: 10.3389/fgene.2015.00138] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 03/23/2015] [Indexed: 01/04/2023] Open
Abstract
Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development.
Collapse
Affiliation(s)
- Mehdi Pirooznia
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Fernando S Goes
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Peter P Zandi
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA ; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health Baltimore, MD, USA USA
| |
Collapse
|