1
|
Qu HQ, Glessner JT, Qu J, Liu Y, Watson D, Chang X, Saeidian AH, Qiu H, Mentch FD, Connolly JJ, Hakonarson H. High Comorbidity of Pediatric Cancers in Patients with Birth Defects: Insights from Whole Genome Sequencing Analysis of Copy Number Variations. Transl Res 2024; 266:49-56. [PMID: 37989391 DOI: 10.1016/j.trsl.2023.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/01/2023] [Accepted: 11/17/2023] [Indexed: 11/23/2023]
Abstract
BACKGROUND Patients with birth defects (BD) exhibit an elevated risk of cancer. We aimed to investigate the potential link between pediatric cancers and BDs, exploring the hypothesis of shared genetic defects contributing to the coexistence of these conditions. METHODS This study included 1454 probands with BDs (704 females and 750 males), including 619 (42.3%) with and 845 (57.7%) without co-occurrence of pediatric onset cancers. Whole genome sequencing (WGS) was done at 30X coverage through the Kids First/Gabriella Miller X01 Program. RESULTS 8211 CNV loci were called from the 1454 unrelated individuals. 191 CNV loci classified as pathogenic/likely pathogenic (P/LP) were identified in 309 (21.3%) patients, with 124 (40.1%) of these patients having pediatric onset cancers. The most common group of CNVs are pathogenic deletions covering the region ChrX:52,863,011-55,652,521, seen in 162 patients including 17 males. Large recurrent P/LP duplications >5MB were detected in 33 patients. CONCLUSIONS This study revealed that P/LP CNVs were common in a large cohort of BD patients with high rate of pediatric cancers. We present a comprehensive spectrum of P/LP CNVs in patients with BDs and various cancers. Notably, deletions involving E2F target genes and genes implicated in mitotic spindle assembly and G2/M checkpoint were identified, potentially disrupting cell-cycle progression and providing mechanistic insights into the concurrent occurrence of BDs and cancers.
Collapse
Affiliation(s)
- Hui-Qi Qu
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Joseph T Glessner
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA; Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, 19104, USA; Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Jingchun Qu
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Yichuan Liu
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Deborah Watson
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Xiao Chang
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Amir Hossein Saeidian
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Haijun Qiu
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Frank D Mentch
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - John J Connolly
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA
| | - Hakon Hakonarson
- Center for Applied Genomics (CAG), Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA; Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, 19104, USA; Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA; Division of Pulmonary Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, 19104, USA; Faculty of Medicine, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
2
|
Zhu H, Lu X, Jiang H, Yang Z, Xu T. Descriptive Statistics and Genome-Wide Copy Number Analysis of Milk Production Traits of Jiangsu Chinese Holstein Cows. Animals (Basel) 2023; 14:17. [PMID: 38200748 PMCID: PMC10778490 DOI: 10.3390/ani14010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/05/2023] [Accepted: 12/18/2023] [Indexed: 01/12/2024] Open
Abstract
Milk production traits are the most important quantitative economic traits in dairy cow production; improving the yield and quality of milk is an important way to ensure the production efficiency of the dairy industry. This study carried out a series of in-depth statistical genetics studies and molecular analyses on the Chinese Holstein cows in the Jiangsu Province, such as descriptive statistics and copy number variation analysis. A genetic correlation, phenotypic correlation, and descriptive statistical analysis of five milk production traits (milk yield, milk fat percentage, milk fat yield, milk protein percentage, and milk protein yield) of the dairy cows were analyzed using the SPSS and DMU software. Through quality control, 4173 cows and their genomes were used for genomic study. Then, SNPs were detected using DNA chips, and a copy number variation (CNV) analysis was carried out to locate the quantitative trait loci (QTL) of the milk production traits by Perl program software Penn CNV and hidden Markov model (HMM). The phenotypic means of the milk yield, milk fat percentage, milk fat mass, milk protein percentage, and milk protein mass at the first trimester were lower than those at the other trimesters by 8.821%, 1.031%, 0.930%, 0.003%, and 0.826%, respectively. The five milk production traits showed a significant phenotypic positive correlation (p < 0.01) and a high genetic positive correlation among the three parities. Based on the GGPBovine 100 K SNP data, QTL-detecting research on the fist-parity milk performance of dairy cows was carried out via the CNV. We identified 1731 CNVs and 236 CNVRs in the 29 autosomes of 984 Holstein dairy cows, and 19 CNVRs were significantly associated with the milk production traits (p < 0.05). These CNVRs were analyzed via a bioinformatics analysis; a total of 13 gene ontology (GO) terms and 20 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were significantly enriched (p < 0.05), and these terms and pathways are mainly related to lipid metabolism, amino acid metabolism, and cellular catabolic processes. This study provided a theoretical basis for the molecular-marker-assisted selection of dairy cows by developing descriptive statistics on the milk production traits of dairy cows and by locating the QTL and functional genes that affect the milk production traits of first-born dairy cows. The results describe the basic status of the milk production traits of the Chinese Holstein cows in Jiangsu and locate the QTL and functional genes that affect the milk production traits of the first-born cows, providing a theoretical basis for the molecular-marker-assisted selection of dairy cows.
Collapse
Affiliation(s)
- Hao Zhu
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, Yangzhou University, Yangzhou 225009, China; (H.Z.); (Z.Y.)
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China;
| | - Xubin Lu
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China;
| | - Hui Jiang
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000 Aarhus C, Denmark;
| | - Zhangping Yang
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, Yangzhou University, Yangzhou 225009, China; (H.Z.); (Z.Y.)
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China;
| | - Tianle Xu
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, Yangzhou University, Yangzhou 225009, China; (H.Z.); (Z.Y.)
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China;
- International Joint Research Laboratory, Universities of Jiangsu Province of China for Domestic Animal Germplasm Resources and Genetic Improvement, Yangzhou 225009, China
| |
Collapse
|
3
|
Kosugi S, Kamatani Y, Harada K, Tomizuka K, Momozawa Y, Morisaki T, Terao C. Detection of trait-associated structural variations using short-read sequencing. CELL GENOMICS 2023; 3:100328. [PMID: 37388916 PMCID: PMC10300613 DOI: 10.1016/j.xgen.2023.100328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 02/17/2023] [Accepted: 04/25/2023] [Indexed: 07/01/2023]
Abstract
Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba 277-8562, Japan
| | - Katsutoshi Harada
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan
| | - Takayuki Morisaki
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
| | | | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
4
|
Ye B, Tang X, Liao S, Ding K. A comparison of algorithms for identifying copy number variants in family-based whole-exome sequencing data and its implications in inheritance pattern analysis. Gene 2023; 861:147237. [PMID: 36731620 DOI: 10.1016/j.gene.2023.147237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 12/27/2022] [Accepted: 01/26/2023] [Indexed: 01/31/2023]
Abstract
There remain challenges in accurately identifying constitutional or germline copy number variants (gCNVs) based on whole-exome sequencing data that have implications for genetic diagnosis for 'rare undiagnosed disease' in the clinical setting. Although multiple algorithms have been proposed, a systematic comparison of these algorithms for calling gCNVs and analyzing inherited pattern have yet to be fully conducted. Therefore, we empirically compared seven exome-based algorithms, including XHMM, CLAMMS, CODEX2, ExomeDepth, DECoN, CN.MOPS, and GATK gCNV, for calling gCNVs in 151 individuals from 44 pedigrees, together with the gold standard of genotyping-derived gCNVs in the same cohort for the performance assessment. These algorithms demonstrated varied powers in identifying gCNVs, although the distribution of gCNVs size was similar. The number of shared gCNVs across these algorithms was limited (e.g., only four gCNVs shared among seven algorithms); however, several algorithms showed varying degrees of consistency (e.g., 1,843 gCNVs shared between DECoN and ExomeDepth). CLAMMS and CODEX2 outperformed the remaining algorithms according to a relatively higher F-score (i.e., 0.145 and 0.152, respectively). In addition, these algorithms exhibited different Mendelian inconsistencies of gCNVs and significant challenges remained in inheritance pattern analysis. In conclusion, selecting good algorithms may have important implications in gCNVs-based inheritance pattern analysis for family-based studies.
Collapse
Affiliation(s)
- Bo Ye
- Department of Bioinformatics, School of Basic Medicine, Chongqing Medical University, Chongqing 400016, PR China
| | - Xia Tang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, PR China
| | - Shixiu Liao
- Medical Genetic Institute of Henan Province, Henan Provincial People's Hospital, Henan Key Laboratory of Genetic Diseases and Functional Genomics, Henan Provincial People's Hospital of Henan University, People's Hospital of Zhengzhou University, Zhengzhou, Henan Province 450003, PR China.
| | - Keyue Ding
- Medical Genetic Institute of Henan Province, Henan Provincial People's Hospital, Henan Key Laboratory of Genetic Diseases and Functional Genomics, Henan Provincial People's Hospital of Henan University, People's Hospital of Zhengzhou University, Zhengzhou, Henan Province 450003, PR China; Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55905, United States.
| |
Collapse
|
5
|
Ghorbani F, de Boer EN, Benjamins-Stok M, Verschuuren-Bemelmans CC, Knapper J, de Boer-Bergsma J, de Vries JJ, Sikkema-Raddatz B, Verbeek DS, Westers H, van Diemen CC. Copy Number Variant Analysis of Spinocerebellar Ataxia Genes in a Cohort of Dutch Patients With Cerebellar Ataxia. Neurol Genet 2023; 9:e200050. [PMID: 38058854 PMCID: PMC10696507 DOI: 10.1212/nxg.0000000000200050] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 10/27/2022] [Indexed: 12/08/2023]
Abstract
Background and Objectives The spinocerebellar ataxias (SCAs) are a genetically heterogeneous group of neurodegenerative disorders generally caused by single nucleotide variants (SNVs) or indels in coding regions or by repeat expansions in coding and noncoding regions of SCA genes. Copy number variants (CNVs) have now also been reported for 3 genes-ITPR1, FGF14, and SPTBN2-but not all SCA genes have been screened for CNVs as the underlying cause of the disease in patients. In this study, we aim to assess the prevalence of CNVs encompassing 36 known SCA genes. Methods A cohort of patients with cerebellar ataxia who were referred to the University Medical Center Groningen for SCA genetic diagnostics was selected for this study. Genome-wide single nucleotide polymorphism (SNP) genotyping was performed using the Infinium Global Screening Array. Following data processing, genotyping data were uploaded into NxClinical software to perform CNV analysis per patient and to visualize identified CNVs in 36 genes with allocated SCA symbols. The clinical relevance of detected CNVs was determined using evidence from studies based on PubMed literature searches for similar CNVs and phenotypic features. Results Of the 338 patients with cerebellar ataxia, we identified putative clinically relevant CNV deletions in 3 patients: an identical deletion encompassing ITPR1 in 2 patients, who turned out to be related, and a deletion involving PPP2R2B in another patient. Although the CNV deletion in ITPR1 was clearly the underlying cause of SCA15 in the 2 related patients, the clinical significance of the deletion in PPP2R2B remained unknown. Discussion We showed that CNVs detectable with the limited resolution of SNP array are a very rare cause of SCA. Nevertheless, we suggest adding CNV analysis alongside SNV analysis to SCA gene diagnostics using next-generation sequencing approaches, at least for ITPR1, to improve the genetic diagnostics for patients.
Collapse
Affiliation(s)
- Fatemeh Ghorbani
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Eddy N de Boer
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Marloes Benjamins-Stok
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Corien C Verschuuren-Bemelmans
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Jurjen Knapper
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Jelkje de Boer-Bergsma
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Jeroen J de Vries
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Birgit Sikkema-Raddatz
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Dineke S Verbeek
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Helga Westers
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Cleo C van Diemen
- From the Department of Genetics (F.G., E.N.d.B., M.B.-S., C.C.V.-B., J.K., J.d.B.-B., B.S.-R., D.S.V., H.W., C.C.v.D.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; and Department of Neurology (J.J.d.V.), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
6
|
Glessner JT, Hou X, Zhong C, Zhang J, Khan M, Brand F, Krawitz P, Sleiman PMA, Hakonarson H, Wei Z. DeepCNV: a deep learning approach for authenticating copy number variations. Brief Bioinform 2021; 22:bbaa381. [PMID: 33429424 PMCID: PMC8681111 DOI: 10.1093/bib/bbaa381] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 11/24/2020] [Accepted: 11/26/2020] [Indexed: 12/14/2022] Open
Abstract
Copy number variations (CNVs) are an important class of variations contributing to the pathogenesis of many disease phenotypes. Detecting CNVs from genomic data remains difficult, and the most currently applied methods suffer from an unacceptably high false positive rate. A common practice is to have human experts manually review original CNV calls for filtering false positives before further downstream analysis or experimental validation. Here, we propose DeepCNV, a deep learning-based tool, intended to replace human experts when validating CNV calls, focusing on the calls made by one of the most accurate CNV callers, PennCNV. The sophistication of the deep neural network algorithm is enriched with over 10 000 expert-scored samples that are split into training and testing sets. Variant confidence, especially for CNVs, is a main roadblock impeding the progress of linking CNVs with the disease. We show that DeepCNV adds to the confidence of the CNV calls with an optimal area under the receiver operating characteristic curve of 0.909, exceeding other machine learning methods. The superiority of DeepCNV was also benchmarked and confirmed using an experimental wet-lab validation dataset. We conclude that the improvement obtained by DeepCNV results in significantly fewer false positive results and failures to replicate the CNV association results.
Collapse
Affiliation(s)
- Joseph T Glessner
- Center for Applied Genomics, Department of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Perelman School of Medicine, Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19102, USA
| | - Xiurui Hou
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Cheng Zhong
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | | | - Munir Khan
- Center for Applied Genomics, Department of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Perelman School of Medicine, Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19102, USA
| | | | | | - Patrick M A Sleiman
- Perelman School of Medicine, Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19102, USA
| | - Hakon Hakonarson
- Perelman School of Medicine, Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19102, USA
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| |
Collapse
|
7
|
Qin F, Luo X, Cai G, Xiao F. Shall genomic correlation structure be considered in copy number variants detection? Brief Bioinform 2021; 22:6295811. [PMID: 34114005 DOI: 10.1093/bib/bbab215] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 04/16/2021] [Accepted: 05/17/2021] [Indexed: 11/14/2022] Open
Abstract
Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-coding regions with direct functional interpretation. We have previously shown evidence of the genomic correlation structure in array data and developed a novel chromosomal breakpoint detection algorithm, LDcnv, which showed significantly improved detection power through integrating the correlation structure in a systematic modeling manner. However, it remains unexplored whether the genomic correlation exists in WES data and how such correlation structure integration can improve the CNV detection accuracy. In this study, we first explored the correlation structure of the WES data using the 1000 Genomes Project data. Both real raw read depth and median-normalized data showed strong evidence of the correlation structure. Motivated by this fact, we proposed a correlation-based method, CORRseq, as a novel release of the LDcnv algorithm in profiling WES data. The performance of CORRseq was evaluated in extensive simulation studies and real data analysis from the 1000 Genomes Project. CORRseq outperformed the existing methods in detecting medium and large CNVs. In conclusion, it would be more advantageous to model genomic correlation structure in detecting relatively long CNVs. This study provides great insights for methodology development of CNV detection with NGS data.
Collapse
Affiliation(s)
- Fei Qin
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina (USC), Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Xizhi Luo
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Guoshuai Cai
- Department of Environmental Health Science, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Feifei Xiao
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| |
Collapse
|
8
|
Bai J, Shi J, Li C, Wang S, Zhang T, Hua X, Zhu B, Koka H, Wu HH, Song L, Wang D, Wang M, Zhou W, Ballew BJ, Zhu B, Hicks B, Mirabello L, Parry DM, Zhai Y, Li M, Du J, Wang J, Zhang S, Liu Q, Zhao P, Gui S, Goldstein AM, Zhang Y, Yang XR. Whole genome sequencing of skull-base chordoma reveals genomic alterations associated with recurrence and chordoma-specific survival. Nat Commun 2021; 12:757. [PMID: 33536423 PMCID: PMC7859411 DOI: 10.1038/s41467-021-21026-5] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 01/06/2021] [Indexed: 02/06/2023] Open
Abstract
Chordoma is a rare bone tumor with an unknown etiology and high recurrence rate. Here we conduct whole genome sequencing of 80 skull-base chordomas and identify PBRM1, a SWI/SNF (SWItch/Sucrose Non-Fermentable) complex subunit gene, as a significantly mutated driver gene. Genomic alterations in PBRM1 (12.5%) and homozygous deletions of the CDKN2A/2B locus are the most prevalent events. The combination of PBRM1 alterations and the chromosome 22q deletion, which involves another SWI/SNF gene (SMARCB1), shows strong associations with poor chordoma-specific survival (Hazard ratio [HR] = 10.55, 95% confidence interval [CI] = 2.81-39.64, p = 0.001) and recurrence-free survival (HR = 4.30, 95% CI = 2.34-7.91, p = 2.77 × 10-6). Despite the low mutation rate, extensive somatic copy number alterations frequently occur, most of which are clonal and showed highly concordant profiles between paired primary and recurrence/metastasis samples, indicating their importance in chordoma initiation. In this work, our findings provide important biological and clinical insights into skull-base chordoma.
Collapse
Affiliation(s)
- Jiwei Bai
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Chuzhong Li
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
- Brain Tumor Center, Beijing Institute for Brain Disorders, Beijing, China
| | - Shuai Wang
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
| | - Tongwu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Hela Koka
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Ho-Hsiang Wu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Lei Song
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Difei Wang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Mingyi Wang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Weiyin Zhou
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bari J Ballew
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Belynda Hicks
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Dilys M Parry
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Yixuan Zhai
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Department of Neurosurgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Mingxuan Li
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
| | - Jiang Du
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
- Brain Tumor Center, Beijing Institute for Brain Disorders, Beijing, China
| | - Junmei Wang
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
- Brain Tumor Center, Beijing Institute for Brain Disorders, Beijing, China
| | - Shuheng Zhang
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Department of Neurosurgery, Anshan Central Hospital, Anshan, China
| | - Qian Liu
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
| | - Peng Zhao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Songbai Gui
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Alisa M Goldstein
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Yazhuo Zhang
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China.
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- China National Clinical Research Center for Neurological Diseases, Beijing, China.
- Brain Tumor Center, Beijing Institute for Brain Disorders, Beijing, China.
| | - Xiaohong R Yang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| |
Collapse
|
9
|
Lavrichenko K, Helgeland Ø, Njølstad PR, Jonassen I, Johansson S. SeeCiTe: a method to assess CNV calls from SNP arrays using trio data. Bioinformatics 2021; 37:1876-1883. [PMID: 33459766 PMCID: PMC8317106 DOI: 10.1093/bioinformatics/btab028] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/17/2020] [Accepted: 01/11/2021] [Indexed: 11/15/2022] Open
Abstract
Motivation Single nucleotide polymorphism (SNP) genotyping arrays remain an attractive platform for assaying copy number variants (CNVs) in large population-wide cohorts. However, current tools for calling CNVs are still prone to extensive false positive calls when applied to biobank scale arrays. Moreover, there is a lack of methods exploiting cohorts with trios available (e.g. nuclear family) to assist in quality control and downstream analyses following the calling. Results We developed SeeCiTe (Seeing CNVs in Trios), a novel CNV-quality control tool that postprocesses output from current CNV-calling tools exploiting child-parent trio data to classify calls in quality categories and provide a set of visualizations for each putative CNV call in the offspring. We apply it to the Norwegian Mother, Father and Child Cohort Study (MoBa) and show that SeeCiTe improves the specificity and sensitivity compared to the common empiric filtering strategies. To our knowledge, it is the first tool that utilizes probe-level CNV data in trios (and singletons) to systematically highlight potential artifacts and visualize signal intensities in a streamlined fashion suitable for biobank scale studies. Availability and implementation The software is implemented in R with the source code freely available at https://github.com/aksenia/SeeCiTe Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ksenia Lavrichenko
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.,Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Øyvind Helgeland
- Department of Clinical Science, University of Bergen, Bergen, Norway.,Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway
| | - Pål R Njølstad
- Department of Clinical Science, University of Bergen, Bergen, Norway.,Department of Pediatrics and Adolescents, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Stefan Johansson
- Department of Clinical Science, University of Bergen, Bergen, Norway.,Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| |
Collapse
|
10
|
Serin Harmanci A, Harmanci AO, Zhou X. CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data. Nat Commun 2020; 11:89. [PMID: 31900397 PMCID: PMC6941987 DOI: 10.1038/s41467-019-13779-x] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 11/25/2019] [Indexed: 12/15/2022] Open
Abstract
RNA sequencing experiments generate large amounts of information about expression levels of genes. Although they are mainly used for quantifying expression levels, they contain much more biologically important information such as copy number variants (CNVs). Here, we present CaSpER, a signal processing approach for identification, visualization, and integrative analysis of focal and large-scale CNV events in multiscale resolution using either bulk or single-cell RNA sequencing data. CaSpER integrates the multiscale smoothing of expression signal and allelic shift signals for CNV calling. The allelic shift signal measures the loss-of-heterozygosity (LOH) which is valuable for CNV identification. CaSpER employs an efficient methodology for the generation of a genome-wide B-allele frequency (BAF) signal profile from the reads and utilizes it for correction of CNVs calls. CaSpER increases the utility of RNA-sequencing datasets and complements other tools for complete characterization and visualization of the genomic and transcriptomic landscape of single cell and bulk RNA sequencing data.
Collapse
Affiliation(s)
- Akdes Serin Harmanci
- Center for Computational Systems Medicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Arif O Harmanci
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Xiaobo Zhou
- Center for Computational Systems Medicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Department of Integrative Biology and Pharmacology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- School of Dentistry, University of Texas Health Science Center at Houston, Houston, TX, 77054, USA.
| |
Collapse
|
11
|
Zhang M, Liu D, Tang J, Feng Y, Wang T, Dobbin KK, Schliekelman P, Zhao S. SEG - A Software Program for Finding Somatic Copy Number Alterations in Whole Genome Sequencing Data of Cancer. Comput Struct Biotechnol J 2018; 16:335-341. [PMID: 30258547 PMCID: PMC6154469 DOI: 10.1016/j.csbj.2018.09.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 08/31/2018] [Accepted: 09/01/2018] [Indexed: 01/15/2023] Open
Abstract
As next-generation sequencing technology advances and the cost decreases, whole genome sequencing (WGS) has become the preferred platform for the identification of somatic copy number alteration (CNA) events in cancer genomes. To more effectively decipher these massive sequencing data, we developed a software program named SEG, shortened from the word “segment”. SEG utilizes mapped read or fragment density for CNA discovery. To reduce CNA artifacts arisen from sequencing and mapping biases, SEG first normalizes the data by taking the log2-ratio of each tumor density against its matching normal density. SEG then uses dynamic programming to find change-points among a contiguous log2-ratio data series along a chromosome, dividing the chromosome into different segments. SEG finally identifies those segments having CNA. Our analyses with both simulated and real sequencing data indicate that SEG finds more small CNAs than other published software tools.
Collapse
Affiliation(s)
- Mucheng Zhang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Deli Liu
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Jie Tang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Yuan Feng
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Tianfang Wang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| | - Kevin K Dobbin
- Department of Biostatistics, University of Georgia, Athens, GA30602-7229, USA
| | - Paul Schliekelman
- Department of Statistics, University of Georgia, Athens, GA30602-7229, USA
| | - Shaying Zhao
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA30602-7229, USA
| |
Collapse
|
12
|
Liu Z, Zheng WJ, Allen GI, Liu Y, Ruan J, Zhao Z. The International Conference on Intelligent Biology and Medicine (ICIBM) 2016: from big data to big analytical tools. BMC Bioinformatics 2017; 18:405. [PMID: 28984189 PMCID: PMC5629550 DOI: 10.1186/s12859-017-1797-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) was held on December 8-10, 2016 in Houston, Texas, USA. ICIBM included eight scientific sessions, four tutorials, one poster session, four highlighted talks and four keynotes that covered topics on 3D genomics structural analysis, next generation sequencing (NGS) analysis, computational drug discovery, medical informatics, cancer genomics, and systems biology. Here, we present a summary of the nine research articles selected from ICIBM 2016 program for publishing in BMC Bioinformatics.
Collapse
Affiliation(s)
- Zhandong Liu
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA. .,Department of Pediatrics-Neurology, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - W Jim Zheng
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Genevera I Allen
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA.,Department of Statistics, Rice University, Houston, TX, 77030, USA.,Department of Electrical and Computer Engineering, Rice University, Houston, TX, 77030, USA
| | - Yin Liu
- Department of Neurobiology and Anatomy, The University of Texas Medical School at Houston, Houston, TX, 77030, USA
| | - Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX, 78249, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA. .,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|