1
|
Tang F, Gao Y, Li K, Tang D, Hao Y, Lv M, Wu H, Cheng H, Fei J, Jin Z, Wang C, Xu Y, Wei Z, Zhou P, Zhang Z, He X, Cao Y. Novel deleterious splicing variant in HFM1 causes gametogenesis defect and recurrent implantation failure: concerning the risk of chromosomal abnormalities in embryos. J Assist Reprod Genet 2023; 40:1689-1702. [PMID: 36864181 PMCID: PMC10352197 DOI: 10.1007/s10815-023-02761-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 02/21/2023] [Indexed: 03/04/2023] Open
Abstract
PURPOSE Poor ovarian response (POR) affects approximately 9% to 24% of women undergoing in vitro fertilization (IVF) cycles, resulting in fewer eggs obtained and increasing clinical cycle cancellation rates. The pathogenesis of POR is related to gene variations. Our study included a Chinese family comprising two siblings with infertility born to consanguineous parents. Poor ovarian response (POR) was identified in the female patient who had multiple embryo implantation failures occurring in subsequent assisted reproductive technology cycles. Meanwhile, the male patient was diagnosed with non-obstructive azoospermia (NOA). METHODS Whole-exome sequencing and rigorous bioinformatics analyses were conducted to identify the underlying genetic causes. Moreover, the pathogenicity of the identified splicing variant was assessed using a minigene assay in vitro. The remaining poor-quality blastocyst and abortion tissues from the female patient were detected for copy number variations. RESULTS We identified a novel homozygous splicing variant in HFM1 (NM_001017975.6: c.1730-1G > T) in two siblings. Apart from NOA and POI, biallelic variants in HFM1 were also associated with recurrent implantation failure (RIF). Additionally, we demonstrated that splicing variants caused abnormal alternative splicing of HFM1. Using copy number variation sequencing, we found that the embryos of the female patients had either euploidy or aneuploidy; however, both harbored chromosomal microduplications of maternal origin. CONCLUSION Our results reveal the different effects of HFM1 on reproductive injury in males and females, extend the phenotypic and mutational spectrum of HFM1, and show the potential risk of chromosomal abnormalities under the RIF phenotype. Moreover, our study provides new diagnostic markers for the genetic counseling of POR patients.
Collapse
Affiliation(s)
- Fei Tang
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China
| | - Yang Gao
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China
| | - KuoKuo Li
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Reproductive Health and Genetics, Hefei, Anhui, China
| | - DongDong Tang
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Reproductive Health and Genetics, Hefei, Anhui, China
| | - Yan Hao
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center of Biopreservation and Artificial Organs, Hefei, Anhui, China
| | - Mingrong Lv
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China
| | - Huan Wu
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Reproductive Health and Genetics, Hefei, Anhui, China
| | - Huiru Cheng
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Reproductive Health and Genetics, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center of Biopreservation and Artificial Organs, Hefei, Anhui, China
| | - Jia Fei
- Peking Jabrehoo Med Tech Co., Ltd., Beijing, China
| | - Zhiping Jin
- Peking Jabrehoo Med Tech Co., Ltd., Beijing, China
| | - Chao Wang
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China
| | - Yuping Xu
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Reproductive Health and Genetics, Hefei, Anhui, China
| | - Zhaolian Wei
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China
| | - Ping Zhou
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China
| | - Zhiguo Zhang
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China.
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China.
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China.
| | - Xiaojin He
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China.
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China.
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China.
| | - Yunxia Cao
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, The First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, Anhui, China.
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China.
- Key Laboratory of Population Health Across Life Cycle, Ministry of Education of the People's Republic of China, Anhui Medical University, Hefei, Anhui, China.
| |
Collapse
|
2
|
Luo W, Zheng YM, Hao Y, Zhang Y, Zhou P, Wei Z, Cao Y, Chen D. Mitochondrial DNA quantification correlates with the developmental potential of human euploid blastocysts but not with that of mosaic blastocysts. BMC Pregnancy Childbirth 2023; 23:447. [PMID: 37322435 DOI: 10.1186/s12884-023-05760-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 06/05/2023] [Indexed: 06/17/2023] Open
Abstract
PURPOSE We aimed to study the association between adjusted mtDNA levels in human trophectoderm biopsy samples and the developmental potential of euploid and mosaic blastocysts. METHODS We analyzed relative mtDNA levels in 2,814 blastocysts obtained from 576 couples undergoing preimplantation genetic testing for aneuploidy from June 2018 to June 2021. All patients underwent in vitro fertilization in a single clinic; the study was blinded-mtDNA content was unknown at the time of single embryo transfer. The fate of the euploid or mosaic embryos transferred was compared with mtDNA levels. RESULTS Euploid embryos had lower mtDNA than aneuploid and mosaic embryos. Embryos biopsied on Day 5 had higher mtDNA than those biopsied on Day 6. No difference was detected in mtDNA scores between embryos derived from oocytes of different maternal ages. Linear mixed model suggested that blastulation rate was associated with mtDNA score. Moreover, the specific next-generation sequencing platform used have a significant effect on the observed mtDNA content. Euploid embryos with higher mtDNA content presented significantly higher miscarriage rates and lower live birth rates, while no significant difference was observed in the mosaic cohort. CONCLUSION Our results will aid in improving methods for analyzing the association between mtDNA level and blastocyst viability.
Collapse
Affiliation(s)
- Wen Luo
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, 230022, Anhui, China
- Anhui Provincial Engineering Technology Research Center for Biopreservation and Artificial Organs, Anhui Medical University, No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Yi-Min Zheng
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, 230022, Anhui, China
- NHC Key Laboratory of Study On Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Yan Hao
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, 230022, Anhui, China
- Anhui Provincial Engineering Technology Research Center for Biopreservation and Artificial Organs, Anhui Medical University, No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Ying Zhang
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, 230022, Anhui, China
| | - Ping Zhou
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, 230022, Anhui, China
- Anhui Provincial Engineering Technology Research Center for Biopreservation and Artificial Organs, Anhui Medical University, No 81 Meishan Road, Hefei, 230032, Anhui, China
- NHC Key Laboratory of Study On Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Zhaolian Wei
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, 230022, Anhui, China
- Anhui Provincial Engineering Technology Research Center for Biopreservation and Artificial Organs, Anhui Medical University, No 81 Meishan Road, Hefei, 230032, Anhui, China
- NHC Key Laboratory of Study On Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Yunxia Cao
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, 230022, Anhui, China.
- Anhui Provincial Engineering Technology Research Center for Biopreservation and Artificial Organs, Anhui Medical University, No 81 Meishan Road, Hefei, 230032, Anhui, China.
- NHC Key Laboratory of Study On Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China.
| | - Dawei Chen
- Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, No 218 Jixi Road, Hefei, 230022, Anhui, China.
- NHC Key Laboratory of Study On Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China.
| |
Collapse
|
3
|
Points to consider in the detection of germline structural variants using next-generation sequencing: A statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2023; 25:100316. [PMID: 36507974 DOI: 10.1016/j.gim.2022.09.017] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 09/29/2022] [Accepted: 09/30/2022] [Indexed: 12/14/2022] Open
|
4
|
Ou, Ni MengZhangDingZouZhengZhang, Li H, Huang Y. Improved pregnancy outcomes from mosaic embryos with lower mtDNA content: a single-center retrospective study. Eur J Obstet Gynecol Reprod Biol 2022; 275:110-114. [DOI: 10.1016/j.ejogrb.2022.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/08/2022] [Accepted: 06/23/2022] [Indexed: 11/04/2022]
|
5
|
Warmuth VM, Weissensteiner MH, Wolf J. Ineffective silencing of transposable elements on an avian W Chromosome. Genome Res 2022; 32:671-681. [PMID: 35149543 PMCID: PMC8997356 DOI: 10.1101/gr.275465.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 02/08/2022] [Indexed: 11/24/2022]
Abstract
One of the defining features of transposable elements (TEs) is their ability to move to new locations in the host genome. To minimise the potentially deleterious effects of de novo TE insertions, hosts have evolved several mechanisms to control TE activity, including recombination-mediated removal and epigenetic silencing; however, increasing evidence suggests that silencing of TEs is often incomplete. The crow family experienced a recent radiation of LTR retrotransposons (LTRs), offering an opportunity to gain insight into the regulatory control of young, potentially still active TEs. We quantified the abundance of TE-derived transcripts across several tissues in 15 Eurasian crows (Corvus (corone) spp.) raised under common garden conditions and find evidence for ineffective TE suppression on the female-specific W Chromosome. Using RNA-seq data, we show that ~ 9.5% of all transcribed TEs had considerably greater (average: 16-fold) transcript abundance in female crows, and that more than 85% of these female-biased TEs originated on the W Chromosome. After accounting for differences in TE density among chromosomal classes, W-linked TEs were significantly more highly expressed than TEs residing on other chromosomes, consistent with ineffective silencing on the former. Together, our results suggest that the crow W Chromosome acts as a source of transcriptionally active TEs, with possible negative fitness consequences for female birds analogous to Drosophila (an X/Y system), where overexpression of Y-linked TEs is associated with male-specific aging and fitness loss ('toxic Y').
Collapse
|
6
|
Sinha R, Pal RK, De RK. GenSeg and MR-GenSeg: A Novel Segmentation Algorithm and its Parallel MapReduce Based Approach for Identifying Genomic Regions With Copy Number Variations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:443-454. [PMID: 32750860 DOI: 10.1109/tcbb.2020.3000661] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Identifying intragenic as well as intergenic sequences of the DNA, having structural alterations, is a significantly important research area, since this may be the root cause of many neurological and autoimmune diseases, including cancer. Working with whole genome NGS data has provided a new insight in this regard, but has lead to huge explosion of data that is growing exponentially. Hence, the challenges lie in efficient means of storage and processing this big data. In this study, we have developed a novel segmentation algorithm, called GenSeg, and its parallel MapReduce based algorithm, called MR-GenSeg, for detecting copy number variations. In order to annotate CNVs (variants), segments formed by GenSeg/MR-GenSeg have been represented in a novel way using a binary tree, where each node is a CNV event. GenSeg considers each position specific data of whole genome DNA sequence, so that precise identification of breakpoints is possible. GenSeg/MR-GenSeg has been compared with twelve popular CNV detection algorithms, where it has outperformed the others in terms of sensitivity, and has achieved a good F-score value. MR-GenSeg has excelled in terms of SpeedUp, when compared with these algorithms. The effect of CNVs on immunoglobulin (IG) genes has also been analysed in this study. Availability: The source codes are available at https://github.com/rituparna-sinha/MapReduce-GENSEG.
Collapse
|
7
|
Rinaldi AO, Korsfeldt A, Ward S, Burla D, Dreher A, Gautschi M, Stolpe B, Tan G, Bersuch E, Melin D, Askary Lord N, Grant S, Svedenhag P, Tsekova K, Schmid‐Grendelmeier P, Möhrenschlager M, Renner ED, Akdis CA. Electrical impedance spectroscopy for the characterization of skin barrier in atopic dermatitis. Allergy 2021; 76:3066-3079. [PMID: 33830511 DOI: 10.1111/all.14842] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 03/05/2021] [Accepted: 03/23/2021] [Indexed: 12/22/2022]
Abstract
BACKGROUND Allergic disorders such as atopic dermatitis (AD) are strongly associated with an impairment of the epithelial barrier, in which tight junctions and/or filaggrin expression can be defective. Skin barrier assessment shows potential to be clinically useful for prediction of disease development, improved and earlier diagnosis, lesion follow-up, and therapy evaluation. This study aimed to establish a method to directly assess the in vivo status of epithelial barrier using electrical impedance spectroscopy (EIS). METHODS Thirty-six patients with AD were followed during their 3-week hospitalization and compared with 28 controls. EIS and transepidermal water loss (TEWL) were measured in lesional and non-lesional skin. Targeted proteomics by proximity extension assay in serum and whole-genome sequence were performed. RESULTS Electrical impedance spectroscopy was able to assess epithelial barrier integrity, differentiate between patients and controls without AD, and characterize lesional and non-lesional skin of patients. It showed a significant negative correlation with TEWL, but a higher sensitivity to discriminate non-lesional atopic skin from controls. During hospitalization, lesions reported a significant increase in EIS that correlated with healing, decreased SCORAD and itch scores. Additionally, EIS showed a significant inverse correlation with serum biomarkers associated with inflammatory pathways that may affect the epithelial barrier, particularly chemokines such as CCL13, CCL3, CCL7, and CXCL8 and other cytokines, such as IRAK1, IRAK4, and FG2, which were significantly high at admission. Furthermore, filaggrin copy numbers significantly correlated with EIS on non-lesional skin of patients. CONCLUSIONS Electrical impedance spectroscopy can be a useful tool to detect skin barrier dysfunction in vivo, valuable for the assessment of AD severity, progression, and therapy efficacy.
Collapse
Affiliation(s)
- Arturo O. Rinaldi
- Swiss Institute of Allergy and Asthma Research (SIAF Davos Switzerland
| | | | - Siobhan Ward
- Swiss Institute of Allergy and Asthma Research (SIAF Davos Switzerland
- Christine Kühne – Center for Allergy Research and Education (CK‐CARE) Davos Switzerland
| | - Daniel Burla
- Swiss Institute of Allergy and Asthma Research (SIAF Davos Switzerland
| | - Anita Dreher
- Christine Kühne – Center for Allergy Research and Education (CK‐CARE) Davos Switzerland
| | - Marja Gautschi
- Hochgebirgsklinik – High Altitude Clinic (HGK) Davos Switzerland
| | - Britta Stolpe
- Hochgebirgsklinik – High Altitude Clinic (HGK) Davos Switzerland
| | - Ge Tan
- Swiss Institute of Allergy and Asthma Research (SIAF Davos Switzerland
- Functional Genomics Center Zurich ETH Zurich/University of Zurich Zurich Switzerland
| | - Eugen Bersuch
- Christine Kühne – Center for Allergy Research and Education (CK‐CARE) Davos Switzerland
| | | | | | | | | | - Kristina Tsekova
- Hochgebirgsklinik – High Altitude Clinic (HGK) Davos Switzerland
| | - Peter Schmid‐Grendelmeier
- Christine Kühne – Center for Allergy Research and Education (CK‐CARE) Davos Switzerland
- Department of Dermatology University Hospital Zurich Switzerland
| | | | - Ellen D. Renner
- Hochgebirgsklinik – High Altitude Clinic (HGK) Davos Switzerland
- Translational Immunology in Environmental Medicine Technical University of Munich Munich Germany
| | - Cezmi A. Akdis
- Swiss Institute of Allergy and Asthma Research (SIAF Davos Switzerland
- Christine Kühne – Center for Allergy Research and Education (CK‐CARE) Davos Switzerland
| |
Collapse
|
8
|
Jugas R, Sedlar K, Vitek M, Nykrynova M, Barton V, Bezdicek M, Lengerova M, Skutkova H. CNproScan: Hybrid CNV detection for bacterial genomes. Genomics 2021; 113:3103-3111. [PMID: 34224809 DOI: 10.1016/j.ygeno.2021.06.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 06/13/2021] [Accepted: 06/30/2021] [Indexed: 10/20/2022]
Abstract
Discovering copy number variation (CNV) in bacteria is not in the spotlight compared to the attention focused on CNV detection in eukaryotes. However, challenges arising from bacterial drug resistance bring further interest to the topic of CNV and its role in drug resistance. General CNV detection methods do not consider bacteria's features and there is space to improve detection accuracy. Here, we present a CNV detection method called CNproScan focused on bacterial genomes. CNproScan implements a hybrid approach and other bacteria-focused features and depends only on NGS data. We benchmarked our method and compared it to the previously published methods and we can resolve to achieve a higher detection rate together with providing other beneficial features, such as CNV classification. Compared with other methods, CNproScan can detect much shorter CNV events.
Collapse
Affiliation(s)
- Robin Jugas
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic.
| | - Karel Sedlar
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Martin Vitek
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Marketa Nykrynova
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Vojtech Barton
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| | - Matej Bezdicek
- Department of Internal Medicine-Hematology and Oncology, University Hospital Brno, Brno, Czech Republic
| | - Martina Lengerova
- Department of Internal Medicine-Hematology and Oncology, University Hospital Brno, Brno, Czech Republic
| | - Helena Skutkova
- Department of Biomedical Engineering, Brno University of Technology, Brno, Czech Republic
| |
Collapse
|
9
|
Boujemaa M, Hamdi Y, Mejri N, Romdhane L, Ghedira K, Bouaziz H, El Benna H, Labidi S, Dallali H, Jaidane O, Ben Nasr S, Haddaoui A, Rahal K, Abdelhak S, Boussen H, Boubaker MS. Germline copy number variations in BRCA1/2 negative families: Role in the molecular etiology of hereditary breast cancer in Tunisia. PLoS One 2021; 16:e0245362. [PMID: 33503040 PMCID: PMC7840007 DOI: 10.1371/journal.pone.0245362] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 12/28/2020] [Indexed: 12/24/2022] Open
Abstract
Hereditary breast cancer accounts for 5-10% of all breast cancer cases. So far, known genetic risk factors account for only 50% of the breast cancer genetic component and almost a quarter of hereditary cases are carriers of pathogenic mutations in BRCA1/2 genes. Hence, the genetic basis for a significant fraction of familial cases remains unsolved. This missing heritability may be explained in part by Copy Number Variations (CNVs). We herein aimed to evaluate the contribution of CNVs to hereditary breast cancer in Tunisia. Whole exome sequencing was performed for 9 BRCA negative cases with a strong family history of breast cancer and 10 matched controls. CNVs were called using the ExomeDepth R-package and investigated by pathway analysis and web-based bioinformatic tools. Overall, 483 CNVs have been identified in breast cancer patients. Rare CNVs affecting cancer genes were detected, of special interest were those disrupting APC2, POU5F1, DOCK8, KANSL1, TMTC3 and the mismatch repair gene PMS2. In addition, common CNVs known to be associated with breast cancer risk have also been identified including CNVs on APOBECA/B, UGT2B17 and GSTT1 genes. Whereas those disrupting SULT1A1 and UGT2B15 seem to correlate with good clinical response to tamoxifen. Our study revealed new insights regarding CNVs and breast cancer risk in the Tunisian population. These findings suggest that rare and common CNVs may contribute to disease susceptibility. Those affecting mismatch repair genes are of interest and require additional attention since it may help to select candidates for immunotherapy leading to better outcomes.
Collapse
Affiliation(s)
- Maroua Boujemaa
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Yosr Hamdi
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Human and Experimental Pathology, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Nesrine Mejri
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Medical Oncology Department, Abderrahman Mami Hospital, Faculty of Medicine Tunis, University Tunis El Manar, Tunis, Tunisia
| | - Lilia Romdhane
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Department of Biology, Faculty of Science of Bizerte, University of Carthage, Jarzouna, Tunisia
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, LR16IPT09, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Hanen Bouaziz
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Surgical Oncology Department, Salah Azaiez Institute of Cancer, Tunis, Tunisia
| | - Houda El Benna
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Medical Oncology Department, Abderrahman Mami Hospital, Faculty of Medicine Tunis, University Tunis El Manar, Tunis, Tunisia
| | - Soumaya Labidi
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Medical Oncology Department, Abderrahman Mami Hospital, Faculty of Medicine Tunis, University Tunis El Manar, Tunis, Tunisia
| | - Hamza Dallali
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Olfa Jaidane
- Surgical Oncology Department, Salah Azaiez Institute of Cancer, Tunis, Tunisia
| | - Sonia Ben Nasr
- Department of Medical Oncology, Military Hospital of Tunis, Tunis, Tunisia
| | | | - Khaled Rahal
- Surgical Oncology Department, Salah Azaiez Institute of Cancer, Tunis, Tunisia
| | - Sonia Abdelhak
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Hamouda Boussen
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Medical Oncology Department, Abderrahman Mami Hospital, Faculty of Medicine Tunis, University Tunis El Manar, Tunis, Tunisia
| | - Mohamed Samir Boubaker
- Laboratory of Biomedical Genomics and Oncogenetics, LR16IPT05, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Human and Experimental Pathology, Institut Pasteur de Tunis, Tunis, Tunisia
| |
Collapse
|
10
|
Zhou T, Sengupta S, Müller P, Ji Y. RNDClone: Tumor subclone reconstruction based on integrating DNA and RNA sequence data. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Jang H, Lee H. Multiresolution correction of GC bias and application to identification of copy number alterations. Bioinformatics 2020; 35:3890-3897. [PMID: 30865265 DOI: 10.1093/bioinformatics/btz174] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 03/03/2019] [Accepted: 03/12/2019] [Indexed: 01/03/2023] Open
Abstract
MOTIVATION Whole-genome sequencing (WGS) data are affected by various sequencing biases such as GC bias and mappability bias. These biases degrade performance on detection of genetic variations such as copy number alterations. The existing methods use a relation between the GC proportion and depth of coverage (DOC) of markers by means of regression models. Nonetheless, severity of the GC bias varies from sample to sample. We developed a new method for correction of GC bias on the basis of multiresolution analysis. We used a translation-invariant wavelet transform to decompose biased raw signals into high- and low-frequency coefficients. Then, we modeled the relation between GC proportion and DOC of the genomic regions and constructed new control DOC signals that reflect the GC bias. The control DOC signals are used for normalizing genomic sequences by correcting the GC bias. RESULTS When we applied our method to simulated sequencing data with various degrees of GC bias, our method showed more robust performance on correcting the GC bias than the other methods did. We also applied our method to real-world cancer sequencing datasets and successfully identified cancer-related focal alterations even when cancer genomes were not normalized to normal control samples. In conclusion, our method can be employed for WGS data with different degrees of GC bias. AVAILABILITY AND IMPLEMENTATION The code is available at http://gcancer.org/wabico. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ho Jang
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea
| |
Collapse
|
12
|
Magi A, Bolognini D, Bartalucci N, Mingrino A, Semeraro R, Giovannini L, Bonifacio S, Parrini D, Pelo E, Mannelli F, Guglielmelli P, Maria Vannucchi A. Nano-GLADIATOR: real-time detection of copy number alterations from nanopore sequencing data. Bioinformatics 2020; 35:4213-4221. [PMID: 30949684 DOI: 10.1093/bioinformatics/btz241] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 03/05/2019] [Accepted: 04/03/2019] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The past few years have seen the emergence of nanopore-based sequencing technologies which interrogate single molecule of DNA and generate reads sequentially. RESULTS In this paper, we demonstrate that, thanks to the sequentiality of the nanopore process, the data generated in the first tens of minutes of a typical MinION/GridION run can be exploited to resolve the alterations of a human genome at a karyotype level with a resolution in the order of tens of Mb, while the data produced in the first 6-12 h allow to obtain a resolution comparable to currently available array-based technologies, and thanks to a novel probabilistic approach are capable to predict the allelic fraction of genomic alteration with high accuracy. To exploit the unique characteristics of nanopore sequencing data we developed a novel software tool, Nano-GLADIATOR, that is capable to perform copy number variants/alterations detection and allelic fraction prediction during the sequencing run ('On-line' mode) and after experiment completion ('Off-line' mode). We tested Nano-GLADIATOR on publicly available ('Off-line' mode) and on novel whole genome sequencing dataset generated with MinION device ('On-line' mode) showing that our tool is capable to perform real-time copy number alterations detection obtaining good results with respect to other state-of-the-art tools. AVAILABILITY AND IMPLEMENTATION Nano-GLADIATOR is freely available at https://sourceforge.net/projects/nanogladiator/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Niccoló Bartalucci
- Department of Experimental and Clinical Medicine, CRIMM, Center Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| | - Alessandra Mingrino
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Luna Giovannini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Stefania Bonifacio
- Department of Laboratory Diagnosis, Genetic Diagnosis Service, Careggi Teaching Hospital, Florence, Italy
| | - Daniela Parrini
- Department of Laboratory Diagnosis, Genetic Diagnosis Service, Careggi Teaching Hospital, Florence, Italy
| | - Elisabetta Pelo
- Department of Laboratory Diagnosis, Genetic Diagnosis Service, Careggi Teaching Hospital, Florence, Italy
| | - Francesco Mannelli
- Department of Experimental and Clinical Medicine, CRIMM, Center Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| | - Paola Guglielmelli
- Department of Experimental and Clinical Medicine, CRIMM, Center Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| | - Alessandro Maria Vannucchi
- Department of Experimental and Clinical Medicine, CRIMM, Center Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| |
Collapse
|
13
|
Yang H, Zhu D. Combinatorial Detection Algorithm for Copy Number Variations Using High-throughput Sequencing Reads. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001419500228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Copy number variation (CNV) is a prevalent kind of genetic structural variation which leads to an abnormal number of copies of large genomic regions, such as gain or loss of DNA segments larger than 1[Formula: see text]kb. CNV exists not only in human genome but also in plant genome. Current researches have testified that CNV is associated with many complex diseases. In this paper, guanine-cytosine (GC) bias, mappability and their effect on read depth signals in sequencing data are discussed first. Subsequently, a new correction method for GC bias and an improved combinatorial detection algorithm for CNV using high-throughput sequencing reads based on hidden Markov model (CNV-HMM) are proposed. The corrected read depth signals have lower correlation with GC content, mappability of reads and the width of analysis window. Then we create a hidden Markov model which maps the reads onto the reference genome and records the unmapped reads. The unmapped reads are counted and normalized. The CNV-HMM detects the abnormal signal of read count and gains the candidate CNVs using the expectation maximization (EM) algorithm. Finally, we filter the candidate CNVs using split reads to promote the performance of our algorithm. The experiment result indicates that the CNV-HMM algorithm has higher accuracy and sensitivity for CNVs detection than most current detection algorithms.
Collapse
Affiliation(s)
- Hai Yang
- School of Computer Science and Technology, Shandong University, Qingdao 266237, P. R. China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, Qingdao 266237, P. R. China
| |
Collapse
|
14
|
Luo F. A systematic evaluation of copy number alterations detection methods on real SNP array and deep sequencing data. BMC Bioinformatics 2019; 20:692. [PMID: 31874603 PMCID: PMC6929333 DOI: 10.1186/s12859-019-3266-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The Copy Number Alterations (CNAs) are discovered to be tightly associated with cancers, so accurately detecting them is one of the most important tasks in the cancer genomics. A series of CNAs detection methods have been proposed and new ones are still being developed. Due to the complexity of CNAs in cancers, no CNAs detection method has been accepted as the gold standard caller. Several evaluation works have made attempts to reveal typical CNAs detection methods' performance. Limited by the scale of evaluation data, these different comparison works don't reach a consensus and the researchers are still confused on how to choose one proper CNAs caller for their analysis. Therefore, it needs a more comprehensive evaluation of typical CNAs detection methods' performance. RESULTS In this work, we use a large-scale real dataset from CAGEKID consortium to evaluate total 12 typical CNAs detection methods. These methods are most widely used in cancer researches and always used as benchmark for the newly proposed CNAs detection methods. This large-scale dataset comprises of SNP array data on 94 samples and the whole genome sequencing data on 10 samples. Evaluations are comprehensively implemented in current scenarios of CNAs detection, which include that detect CNAs on SNP array data, on sequencing data with tumor and normal matched samples and on sequencing data with single tumor sample. Three SNP based methods are firstly ranked. Subsequently, the best SNP based method's results are used as benchmark to compare six matched samples based methods and three single tumor sample based methods in terms of the preprocessing, recall rate, Jaccard index and segmentation characteristics. CONCLUSIONS Our survey thoroughly reveals 12 typical methods' superiority and inferiority. We explain why methods show specific characteristics from a methodological standpoint. Finally, we present the guiding principle for choosing one proper CNAs detection method under specific conditions. Some unsolved problems and expectations are also addressed for upcoming CNAs detection methods.
Collapse
Affiliation(s)
- Fei Luo
- School of Computer Science, Wuhan University, Wuhan, China.
| |
Collapse
|
15
|
Lee J, Chen J. A penalized regression approach for DNA copy number study using the sequencing data. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0001. [PMID: 31145697 DOI: 10.1515/sagmb-2018-0001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Modeling the high-throughput next generation sequencing (NGS) data, resulting from experiments with the goal of profiling tumor and control samples for the study of DNA copy number variants (CNVs), remains to be a challenge in various ways. In this application work, we provide an efficient method for detecting multiple CNVs using NGS reads ratio data. This method is based on a multiple statistical change-points model with the penalized regression approach, 1d fused LASSO, that is designed for ordered data in a one-dimensional structure. In addition, since the path algorithm traces the solution as a function of a tuning parameter, the number and locations of potential CNV region boundaries can be estimated simultaneously in an efficient way. For tuning parameter selection, we then propose a new modified Bayesian information criterion, called JMIC, and compare the proposed JMIC with three different Bayes information criteria used in the literature. Simulation results have shown the better performance of JMIC for tuning parameter selection, in comparison with the other three criterion. We applied our approach to the sequencing data of reads ratio between the breast tumor cell lines HCC1954 and its matched normal cell line BL 1954 and the results are in-line with those discovered in the literature.
Collapse
Affiliation(s)
- Jaeeun Lee
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
| | - Jie Chen
- Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
| |
Collapse
|
16
|
Wang X, Lebarbier E, Aubert J, Robin S. Variational Inference for Coupled Hidden Markov Models Applied to the Joint Detection of Copy Number Variations. Int J Biostat 2019; 15:/j/ijb.ahead-of-print/ijb-2018-0023/ijb-2018-0023.xml. [PMID: 30779702 DOI: 10.1515/ijb-2018-0023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 11/21/2018] [Indexed: 02/04/2023]
Abstract
Hidden Markov models provide a natural statistical framework for the detection of the copy number variations (CNV) in genomics. In this context, we define a hidden Markov process that underlies all individuals jointly in order to detect and to classify genomics regions in different states (typically, deletion, normal or amplification). Structural variations from different individuals may be dependent. It is the case in agronomy where varietal selection program exists and species share a common phylogenetic past. We propose to take into account these dependencies inthe HMM model. When dealing with a large number of series, maximum likelihood inference (performed classically using the EM algorithm) becomes intractable. We thus propose an approximate inference algorithm based on a variational approach (VEM), implemented in the CHMM R package. A simulation study is performed to assess the performance of the proposed method and an application to the detection of structural variations in plant genomes is presented.
Collapse
Affiliation(s)
- Xiaoqiang Wang
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai,Shandong, China
| | - Emilie Lebarbier
- UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| | - Julie Aubert
- UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| | - Stéphane Robin
- UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| |
Collapse
|
17
|
Zhou T, Müller P, Sengupta S, Ji Y. PairClone: a Bayesian subclone caller based on mutation pairs. J R Stat Soc Ser C Appl Stat 2018. [DOI: 10.1111/rssc.12328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Tianjian Zhou
- University of Chicago, NorthShore University HealthSystem EvanstonUSA
- University of Texas at Austin USA
| | | | | | - Yuan Ji
- University of Chicago and NorthShore University HealthSystem Evanston USA
| |
Collapse
|
18
|
Magi A, Giusti B, Tattini L. Characterization of MinION nanopore data for resequencing analyses. Brief Bioinform 2018; 18:940-953. [PMID: 27559152 DOI: 10.1093/bib/bbw077] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Indexed: 02/04/2023] Open
Abstract
The Oxford Nanopore Technologies MinION is a new device, based on nanopore sequencing that is able to generate reads of tens of kilobases in length with faster sequencing time with respect to other platforms. To evaluate the capability of nanopore data to be exploited for resequencing analyses we used the largest MinION data set to date and we compared with Illumina and Pacific Biosciences technologies. By using five different mapping approaches we estimated that the global sequencing error rate of MinION reads, mainly caused by inserted and deleted bases, is around 11%. The study of error distribution showed that substituted, inserted and deleted bases are not randomly distributed along the reads, but mainly occur in specific nucleotide patterns, generating a significant number of genomic loci that can be misclassified as false-positive variants. With 40× sequencing coverage, MinION data can produce at best around one false substitution and insertion every 10-50 kb, and one false deletion every 1000 bp, making use of this technology still challenging for small-sized variant discovery. We also analyzed depth of coverage distribution and we demonstrated that nanopore sequencing is a uniform process that generates sequences randomly and independently without classical sources of bias such as GC-content and mappability. Owing to these properties, the MinION data can be readily used to detect genomic regions involved in copy number variants with high accuracy, outperforming other state-of-the-art sequencing methods in terms of both sensitivity and specificity.
Collapse
|
19
|
D'Aurizio R, Semeraro R, Magi A. Using XCAVATOR and EXCAVATOR2 to Identify CNVs from WGS, WES, and TS Data. CURRENT PROTOCOLS IN HUMAN GENETICS 2018; 98:e65. [PMID: 29975818 DOI: 10.1002/cphg.65] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Copy Number Variants (CNVs) are structural rearrangements contributing to phenotypic variation but also associated with many disease states. In recent years, the identification of CNVs from high-throughput sequencing experiments has become a common practice for both research and clinical purposes. Several computational methods have been developed so far. In this unit, we describe and give instructions on how to run two read count-based tools, XCAVATOR and EXCAVATOR2, which are tailored for the detection of both germline and somatic CNVs from different sequencing experiments (whole-genome, whole-exome, and targeted) in various disease contexts and population genetic studies. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Romina D'Aurizio
- Institute of Informatics and Telematics, National Research Council, Pisa, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Alberto Magi
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| |
Collapse
|
20
|
Resistome of carbapenem- and colistin-resistant Klebsiella pneumoniae clinical isolates. PLoS One 2018; 13:e0198526. [PMID: 29883490 PMCID: PMC5993281 DOI: 10.1371/journal.pone.0198526] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 05/21/2018] [Indexed: 12/21/2022] Open
Abstract
The emergence and dissemination of carbapenemases, bacterial enzymes able to inactivate most β-lactam antibiotics, in Enterobacteriaceae is of increasing concern. The concurrent spread of resistance against colistin, an antibiotic of last resort, further compounds this challenge further. Whole-genome sequencing (WGS) can play a significant role in the rapid and accurate detection/characterization of existing and emergent resistance determinants, an essential aspect of public health surveillance and response activities to combat the spread of antimicrobial resistant bacteria. In the current study, WGS data was used to characterize the genomic content of antimicrobial resistance genes, including those encoding carbapenemases, in 10 multidrug-resistant Klebsiella pneumoniae isolates from Pakistan. These clinical isolates represented five sequence types: ST11 (n = 3 isolates), ST14 (n = 3), ST15 (n = 1), ST101 (n = 2), and ST307 (n = 1). Resistance profiles against 25 clinically-relevant antimicrobials were determined by broth microdilution; resistant phenotypes were observed for at least 15 of the 25 antibiotics tested in all isolates except one. Specifically, 8/10 isolates were carbapenem-resistant and 7/10 isolates were colistin-resistant. The blaNDM-1 and blaOXA-48 carbapenemase genes were present in 7/10 and 5/10 isolates, respectively; including 2 isolates carrying both genes. No plasmid-mediated determinants for colistin resistance (e.g. mcr) were detected, but disruptions and mutations in chromosomal loci (i.e. mgrB and pmrB) previously reported to confer colistin resistance were observed. A blaOXA-48-carrying IncL/M-type plasmid was found in all blaOXA-48-positive isolates. The application of WGS to molecular epidemiology and surveillance studies, as exemplified here, will provide both a more complete understanding of the global distribution of MDR isolates and a robust surveillance tool useful for detecting emerging threats to public health.
Collapse
|
21
|
Wang W, Mao B, Wei X, Yin D, Li H, Mao L, Guo X, Sun Y, Yang Y. Application of an improved targeted next generation sequencing method to diagnose non‑syndromic mental retardation in one step: A case report. Mol Med Rep 2018; 18:981-986. [PMID: 29845227 DOI: 10.3892/mmr.2018.9031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 02/01/2018] [Indexed: 11/06/2022] Open
Abstract
The genetic basis of congenital mental retardation includes chromosomal anomalies and single gene mutations. In addition to chromosome microarray analysis, next‑generation sequencing (NGS) and Sanger sequencing have additionally been applied to identify single gene mutations. However, no methods exist to identify the cause of an anomaly in one step. The present study applied an improved targeted NGS method to diagnose an 8‑year‑old Chinese Han female with mental retardation in one step. The microdeletion 17p11.2 was successfully detected by the improved targeted NGS and no single gene mutations were identified. The same microdeletion was verified using low coverage whole‑genome sequencing. Fertility guidance was also given to the patient's parents. In the present study, an improved targeted NGS method was applied to diagnose non‑syndromic mental retardation of unknown cause in one step. This improved method has the potential to be developed into a screening panel for the effective diagnosis of genetic abnormalities in non‑syndromic mental retardation and other congenital anomalies.
Collapse
Affiliation(s)
- Weipeng Wang
- Prenatal Diagnosis Center, Hubei Maternal and Child Health Hospital, Wuhan, Hubei 430070, P.R. China
| | - Bing Mao
- Department of Neurology, Wuhan Medical and Health Center for Women and Children, Wuhan, Hubei 430016, P.R. China
| | - Xiaoming Wei
- BGI‑Wuhan, BGI‑Shenzhen, Wuhan, Hubei 430074, P.R. China
| | - Dan Yin
- BGI‑Wuhan, BGI‑Shenzhen, Wuhan, Hubei 430074, P.R. China
| | - Hui Li
- Prenatal Diagnosis Center, Hubei Maternal and Child Health Hospital, Wuhan, Hubei 430070, P.R. China
| | - Liangwei Mao
- BGI‑Wuhan, BGI‑Shenzhen, Wuhan, Hubei 430074, P.R. China
| | - Xueqin Guo
- BGI‑Wuhan, BGI‑Shenzhen, Wuhan, Hubei 430074, P.R. China
| | - Yan Sun
- BGI‑Wuhan, BGI‑Shenzhen, Wuhan, Hubei 430074, P.R. China
| | - Yun Yang
- BGI‑Wuhan, BGI‑Shenzhen, Wuhan, Hubei 430074, P.R. China
| |
Collapse
|
22
|
Nishio S, Moteki H, Usami S. Simple and efficient germline copy number variant visualization method for the Ion AmpliSeq™ custom panel. Mol Genet Genomic Med 2018; 6:678-686. [PMID: 29633566 PMCID: PMC6081219 DOI: 10.1002/mgg3.399] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 03/02/2018] [Accepted: 03/06/2018] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Recent advances in molecular genetic analysis using next-generation sequencing (NGS) have drastically accelerated the identification of disease-causing gene mutations. Most next-generation sequencing analyses of inherited diseases have mainly focused on single-nucleotide variants and short indels, although, recently, structure variations including copy number variations have come to be considered an important cause of many different diseases. However, only a limited number of tools are available for multiplex PCR-based target genome enrichment. METHODS In this paper, we reported a simple and efficient copy number variation visualization method for Ion AmpliSeq™ target resequencing data. Unlike the hybridization capture-based target genome enrichment system, Ion AmpliSeq™ reads are multiplex PCR products, and each read generated by the same amplicon is quite uniform in length and position. Based on this feature, the depth of coverage information for each amplicon included in the barcode/amplicon coverage matrix file was used for copy number detection analysis. We also performed copy number analysis to investigate the utility of this method through the use of positive controls and a large Japanese hearing loss cohort. RESULTS Using this method, we successfully confirmed previously reported copy number loss cases involving the STRC gene and copy number gain in trisomy 21 cases. We also performed copy number analysis of a large Japanese hearing loss cohort (2,475 patients) and identified many gene copy number variants. The most prevalent copy number variation was STRC gene copy number loss, with 129 patients carrying this copy number variation. CONCLUSION Our copy number visualization method for Ion AmpliSeq™ data can be utilized in efficient copy number analysis for the comparison of a large number of samples. This method is simple and requires only easy calculations using standard spread sheet software.
Collapse
Affiliation(s)
- Shin‐ya Nishio
- Department of OtorhinolaryngologyShinshu University School of MedicineMatsumoto CityJapan
| | - Hideaki Moteki
- Department of OtorhinolaryngologyShinshu University School of MedicineMatsumoto CityJapan
| | - Shin‐ichi Usami
- Department of OtorhinolaryngologyShinshu University School of MedicineMatsumoto CityJapan
| |
Collapse
|
23
|
DNase-capture reveals differential transcription factor binding modalities. PLoS One 2017; 12:e0187046. [PMID: 29284001 PMCID: PMC5746236 DOI: 10.1371/journal.pone.0187046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 10/12/2017] [Indexed: 11/19/2022] Open
Abstract
We describe DNase-capture, an assay that increases the analytical resolution of DNase-seq by focusing its sequencing phase on selected genomic regions. We introduce a new method to compensate for capture bias called BaseNormal that allows for accurate recovery of transcription factor protection profiles from DNase-capture data. We show that these normalized data allow for nuanced detection of transcription factor binding heterogeneity with as few as dozens of sites.
Collapse
|
24
|
Detecting differential copy number variation between groups of samples. Genome Res 2017; 28:256-265. [PMID: 29229672 PMCID: PMC5793789 DOI: 10.1101/gr.206938.116] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 11/27/2017] [Indexed: 01/20/2023]
Abstract
We present a method to detect copy number variants (CNVs) that are differentially present between two groups of sequenced samples. We use a finite-state transducer where the emitted read depth is conditioned on the mappability and GC-content of all reads that occur at a given base position. In this model, the read depth within a region is a mixture of binomials, which in simulations matches the read depth more closely than the often-used negative binomial distribution. The method analyzes all samples simultaneously, preserving uncertainty as to the breakpoints and magnitude of CNVs present in an individual when it identifies CNVs differentially present between the two groups. We apply this method to identify CNVs that are recurrently associated with postglacial adaptation of marine threespine stickleback (Gasterosteus aculeatus) to freshwater. We identify 6664 regions of the stickleback genome, totaling 1.7 Mbp, which show consistent copy number differences between marine and freshwater populations. These deletions and duplications affect both protein-coding genes and cis-regulatory elements, including a noncoding intronic telencephalon enhancer of DCHS1. The functions of the genes near or included within the 6664 CNVs are enriched for immunity and muscle development, as well as head and limb morphology. Although freshwater stickleback have repeatedly evolved from marine populations, we show that freshwater stickleback also act as reservoirs for ancient ancestral sequences that are highly conserved among distantly related teleosts, but largely missing from marine stickleback due to recent selective sweeps in marine populations.
Collapse
|
25
|
do Nascimento F, Guimaraes KS. Copy Number Variations Detection: Unravelling the Problem in Tangible Aspects. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1237-1250. [PMID: 27295681 DOI: 10.1109/tcbb.2016.2576441] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In the midst of the important genomic variants associated to the susceptibility and resistance to complex diseases, Copy Number Variations (CNV) has emerged as a prevalent class of structural variation. Following the flood of next-generation sequencing data, numerous tools publicly available have been developed to provide computational strategies to identify CNV at improved accuracy. This review goes beyond scrutinizing the main approaches widely used for structural variants detection in general, including Split-Read, Paired-End Mapping, Read-Depth, and Assembly-based. In this paper, (1) we characterize the relevant technical details around the detection of CNV, which can affect the estimation of breakpoints and number of copies, (2) we pinpoint the most important insights related to GC-content and mappability biases, and (3) we discuss the paramount caveats in the tools evaluation process. The points brought out in this study emphasize common assumptions, a variety of possible limitations, valuable insights, and directions for desirable contributions to the state-of-the-art in CNV detection tools.
Collapse
|
26
|
Tan R, Wang J, Wu X, Juan L, Zheng L, Ma R, Zhan Q, Wang T, Jin S, Jiang Q, Wang Y. ERDS-exome: a Hybrid Approach for Copy Number Variant Detection from Whole-exome Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 17:796-803. [PMID: 28981421 DOI: 10.1109/tcbb.2017.2758779] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Copy number variants (CNVs) play important roles in human disease and evolution. With the rapid development of next-generation sequencing technologies, many tools have been developed for inferring CNVs based on whole-exome sequencing (WES) data. However, as a result of the sparse distribution of exons in the genome, the limitations of the WES technique, and the nature of high-level signal noises in WES data, the efficacy of these variants remains less than desirable. Thus, there is need for the development of an effective tool to achieve a considerable power in WES CNVs discovery. In the present study, we describe a novel method, Estimation by Read Depth (RD) with Single-nucleotide variants from exome sequencing data (ERDS-exome). ERDS-exome employs a hybrid normalization approach to normalize WES data and to incorporate RD and single-nucleotide variation information together as a hybrid signal into a paired hidden Markov model to infer CNVs from WES data. Based on systematic evaluations of real data from the 1000 Genomes Project using other state-of-the-art tools, we observed that ERDS-exome demonstrates higher sensitivity and provides comparable or even better specificity than other tools. ERDS-exome is publicly available at: https://erds-exome.github.io.
Collapse
|
27
|
XCAVATOR: accurate detection and genotyping of copy number variants from second and third generation whole-genome sequencing experiments. BMC Genomics 2017; 18:747. [PMID: 28934930 PMCID: PMC5609061 DOI: 10.1186/s12864-017-4137-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 09/11/2017] [Indexed: 11/10/2022] Open
Abstract
Background We developed a novel software package, XCAVATOR, for the identification of genomic regions involved in copy number variants/alterations (CNVs/CNAs) from short and long reads whole-genome sequencing experiments. Results By using simulated and real datasets we showed that our tool, based on read count approach, is capable to predict the boundaries and the absolute number of DNA copies CNVs/CNAs with high resolutions. To demonstrate the power of our software we applied it to the analysis Illumina and Pacific Bioscencies data and we compared its performance to other ten state of the art tools. Conclusion All the analyses we performed demonstrate that XCAVATOR is capable to detect germline and somatic CNVs/CNAs outperforming all the other tools we compared. XCAVATOR is freely available at http://sourceforge.net/projects/xcavator/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4137-0) contains supplementary material, which is available to authorized users.
Collapse
|
28
|
SLMSuite: a suite of algorithms for segmenting genomic profiles. BMC Bioinformatics 2017; 18:321. [PMID: 28659129 PMCID: PMC5490196 DOI: 10.1186/s12859-017-1734-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Accepted: 06/20/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The identification of copy number variants (CNVs) is essential to study human genetic variation and to understand the genetic basis of mendelian disorders and cancers. At present, genome-wide detection of CNVs can be achieved using microarray or second generation sequencing (SGS) data. Although these technologies are very different, the genomic profiles that they generate are mathematically very similar and consist of noisy signals in which a decrease or increase of consecutive data represent deletions or duplication of DNA. In this framework, the most important step of the analysis consists of segmenting genomic profiles for the identification of the boundaries of genomic regions with increased or decreased signal. RESULTS Here we introduce SLMSuite, a collection of algorithms, based on shifting level models (SLM), to segment genomic profiles from array and SGS experiments. The SLM algorithms take as input the log-transformed genomic profiles from SGS or microarray experiments and output segmentation results. We apply our method to the analysis of synthetic genomic profiles and real whole genome sequencing data and we demonstrate that it outperforms the state of the art circular binary segmentation algorithm in terms of sensitivity, specificity and computational speed. CONCLUSION The SLMSuite contains an R library with the segmentation methods and three wrappers that allow to use them in Python, Ruby and C++. SLMSuite is freely available at https://sourceforge.net/projects/slmsuite .
Collapse
|
29
|
Chan CH, Octavia S, Sintchenko V, Lan R. SnpFilt: A pipeline for reference-free assembly-based identification of SNPs in bacterial genomes. Comput Biol Chem 2016; 65:178-184. [DOI: 10.1016/j.compbiolchem.2016.09.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 09/07/2016] [Indexed: 10/21/2022]
|
30
|
Manconi A, Moscatelli M, Armano G, Gnocchi M, Orro A, Milanesi L. Removing duplicate reads using graphics processing units. BMC Bioinformatics 2016; 17:346. [PMID: 28185553 PMCID: PMC5123249 DOI: 10.1186/s12859-016-1192-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background During library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computational effort. To deal with this challenge, we recently proposed a GPU method aimed at removing identical and nearly-identical duplicates generated with an Illumina platform. The method implements an approach based on prefix-suffix comparison. Read sequences with identical prefix are considered potential duplicates. Then, their suffixes are compared to identify and remove those that are actually duplicated. Although the method can be efficiently used to remove duplicates, there are some limitations that need to be overcome. In particular, it cannot to detect potential duplicates in the event that prefixes are longer than 27 bases, and it does not provide support for paired-end read libraries. Moreover, large clusters of potential duplicates are split into smaller with the aim to guarantees a reasonable computing time. This heuristic may affect the accuracy of the analysis. Results In this work we propose GPU-DupRemoval, a new implementation of our method able to (i) cluster reads without constraints on the maximum length of the prefixes, (ii) support both single- and paired-end read libraries, and (iii) analyze large clusters of potential duplicates. Conclusions Due to the massive parallelization obtained by exploiting graphics cards, GPU-DupRemoval removes duplicate reads faster than other cutting-edge solutions, while outperforming most of them in terms of amount of duplicates reads.
Collapse
Affiliation(s)
- Andrea Manconi
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy.
| | - Marco Moscatelli
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy
| | - Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari, P.zza D'Armi, Cagliari (CA), 09123, Italy
| | - Matteo Gnocchi
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy
| | - Alessandro Orro
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy
| | - Luciano Milanesi
- Institute for Biomedical Technologies, National Research Council, Via Fratelli Cervi, 93, Segrate (Mi), 20090, Italy
| |
Collapse
|
31
|
Zhang C, Cai H, Huang J, Song Y. nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data. BMC Bioinformatics 2016; 17:384. [PMID: 27639558 PMCID: PMC5027123 DOI: 10.1186/s12859-016-1239-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 09/04/2016] [Indexed: 02/02/2023] Open
Abstract
Background Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. Results We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Conclusions Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1239-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Changsheng Zhang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hongmin Cai
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China.
| | - Jingying Huang
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yan Song
- School of Computer Science & Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
32
|
D'Aurizio R, Pippucci T, Tattini L, Giusti B, Pellegrini M, Magi A. Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2. Nucleic Acids Res 2016; 44:e154. [PMID: 27507884 PMCID: PMC5175347 DOI: 10.1093/nar/gkw695] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Revised: 07/25/2016] [Accepted: 07/27/2016] [Indexed: 12/26/2022] Open
Abstract
Copy Number Variants (CNVs) are structural rearrangements contributing to phenotypic variation that have been proved to be associated with many disease states. Over the last years, the identification of CNVs from whole-exome sequencing (WES) data has become a common practice for research and clinical purpose and, consequently, the demand for more and more efficient and accurate methods has increased. In this paper, we demonstrate that more than 30% of WES data map outside the targeted regions and that these reads, usually discarded, can be exploited to enhance the identification of CNVs from WES experiments. Here, we present EXCAVATOR2, the first read count based tool that exploits all the reads produced by WES experiments to detect CNVs with a genome-wide resolution. To evaluate the performance of our novel tool we use it for analysing two WES data sets, a population data set sequenced by the 1000 Genomes Project and a tumor data set made of bladder cancer samples. The results obtained from these analyses demonstrate that EXCAVATOR2 outperforms other four state-of-the-art methods and that our combined approach enlarge the spectrum of detectable CNVs from WES data with an unprecedented resolution. EXCAVATOR2 is freely available at http://sourceforge.net/projects/excavator2tool/.
Collapse
Affiliation(s)
- Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, Sant'Orsola Malpighi Polyclinic, Bologna, Italy
| | - Lorenzo Tattini
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Betti Giusti
- Department of Experimental and Clinical Medicine, University of Florence, Florence
| | - Marco Pellegrini
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Alberto Magi
- Department of Experimental and Clinical Medicine, University of Florence, Florence
| |
Collapse
|
33
|
Samarakoon PS, Sorte HS, Stray-Pedersen A, Rødningen OK, Rognes T, Lyle R. cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data. BMC Genomics 2016; 17:51. [PMID: 26764020 PMCID: PMC4712464 DOI: 10.1186/s12864-016-2374-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Accepted: 01/06/2016] [Indexed: 12/30/2022] Open
Abstract
Background With advances in next generation sequencing technology and analysis methods, single nucleotide variants (SNVs) and indels can be detected with high sensitivity and specificity in exome sequencing data. Recent studies have demonstrated the ability to detect disease-causing copy number variants (CNVs) in exome sequencing data. However, exonic CNV prediction programs have shown high false positive CNV counts, which is the major limiting factor for the applicability of these programs in clinical studies. Results We have developed a tool (cnvScan) to improve the clinical utility of computational CNV prediction in exome data. cnvScan can accept input from any CNV prediction program. cnvScan consists of two steps: CNV screening and CNV annotation. CNV screening evaluates CNV prediction using quality scores and refines this using an in-house CNV database, which greatly reduces the false positive rate. The annotation step provides functionally and clinically relevant information using multiple source datasets. We assessed the performance of cnvScan on CNV predictions from five different prediction programs using 64 exomes from Primary Immunodeficiency (PIDD) patients, and identified PIDD-causing CNVs in three individuals from two different families. Conclusions In summary, cnvScan reduces the time and effort required to detect disease-causing CNVs by reducing the false positive count and providing annotation. This improves the clinical utility of CNV detection in exome data. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2374-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Hanne Sørmo Sorte
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
| | - Asbjørg Stray-Pedersen
- Norwegian National Newborn Screening, Oslo University Hospital, Oslo, Norway. .,Center for Human Immunobiology/Section of Immunology, Allergy, and Rheumatology, Texas Children's Hospital, Houston, TX, USA. .,Baylor-Hopkins Center for Mendelian Genomics of the Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| | - Olaug Kristin Rødningen
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
| | - Torbjørn Rognes
- Department of Informatics, University of Oslo, Oslo, Norway. .,Department of Microbiology, Oslo University Hospital, Oslo, Norway.
| | - Robert Lyle
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
| |
Collapse
|
34
|
Anjum S, Morganella S, D'Angelo F, Iavarone A, Ceccarelli M. VEGAWES: variational segmentation on whole exome sequencing for copy number detection. BMC Bioinformatics 2015; 16:315. [PMID: 26416038 PMCID: PMC4587906 DOI: 10.1186/s12859-015-0748-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 09/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variations are important in the detection and progression of significant tumors and diseases. Recently, Whole Exome Sequencing is gaining popularity with copy number variations detection due to low cost and better efficiency. In this work, we developed VEGAWES for accurate and robust detection of copy number variations on WES data. VEGAWES is an extension to a variational based segmentation algorithm, VEGA: Variational estimator for genomic aberrations, which has previously outperformed several algorithms on segmenting array comparative genomic hybridization data. RESULTS We tested this algorithm on synthetic data and 100 Glioblastoma Multiforme primary tumor samples. The results on the real data were analyzed with segmentation obtained from Single-nucleotide polymorphism data as ground truth. We compared our results with two other segmentation algorithms and assessed the performance based on accuracy and time. CONCLUSIONS In terms of both accuracy and time, VEGAWES provided better results on the synthetic data and tumor samples demonstrating its potential in robust detection of aberrant regions in the genome.
Collapse
Affiliation(s)
- Samreen Anjum
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar.
| | - Sandro Morganella
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL -EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.
| | | | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University, New York, 10027, USA.
| | - Michele Ceccarelli
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar. .,Department of Science and Technology, University of Sannio, Benevento, 82100, Italy.
| |
Collapse
|
35
|
CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data. PLoS One 2015; 10:e0135895. [PMID: 26291322 PMCID: PMC4546278 DOI: 10.1371/journal.pone.0135895] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 07/28/2015] [Indexed: 11/19/2022] Open
Abstract
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.
Collapse
|
36
|
Liu Y, Wei X, Kong X, Guo X, Sun Y, Man J, Du L, Zhu H, Qu Z, Tian P, Mao B, Yang Y. Targeted Next-Generation Sequencing for Clinical Diagnosis of 561 Mendelian Diseases. PLoS One 2015; 10:e0133636. [PMID: 26274329 PMCID: PMC4537117 DOI: 10.1371/journal.pone.0133636] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 06/30/2015] [Indexed: 12/04/2022] Open
Abstract
Background Targeted next-generation sequencing (NGS) is a cost-effective approach for rapid and accurate detection of genetic mutations in patients with suspected genetic disorders, which can facilitate effective diagnosis. Methodology/Principal Findings We designed a capture array to mainly capture all the coding sequence (CDS) of 2,181 genes associated with 561 Mendelian diseases and conducted NGS to detect mutations. The accuracy of NGS was 99.95%, which was obtained by comparing the genotypes of selected loci between our method and SNP Array in four samples from normal human adults. We also tested the stability of the method using a sample from normal human adults. The results showed that an average of 97.79% and 96.72% of single-nucleotide variants (SNVs) in the sample could be detected stably in a batch and different batches respectively. In addition, the method could detect various types of mutations. Some disease-causing mutations were detected in 69 clinical cases, including 62 SNVs, 14 insertions and deletions (Indels), 1 copy number variant (CNV), 1 microdeletion and 2 microduplications of chromosomes, of which 35 mutations were novel. Mutations were confirmed by Sanger sequencing or real-time polymerase chain reaction (PCR). Conclusions/Significance Results of the evaluation showed that targeted NGS enabled to detect disease-causing mutations with high accuracy, stability, speed and throughput. Thus, the technology can be used for the clinical diagnosis of 561 Mendelian diseases.
Collapse
Affiliation(s)
- Yanqiu Liu
- Department of Genetics, Jiangxi Provincial Women and Children Hospital, Nanchang, 330006, China
| | - Xiaoming Wei
- BGI-Wuhan, Wuhan, 430075, China
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Xiangdong Kong
- Prenatal Diagnosis Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China
| | - Xueqin Guo
- BGI-Wuhan, Wuhan, 430075, China
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Yan Sun
- BGI-Wuhan, Wuhan, 430075, China
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Jianfen Man
- BGI-Wuhan, Wuhan, 430075, China
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Lique Du
- BGI-Wuhan, Wuhan, 430075, China
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Hui Zhu
- BGI-Wuhan, Wuhan, 430075, China
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Zelan Qu
- BGI-Shenzhen, Shenzhen, 518083, China
| | - Ping Tian
- Department of Obstetrics and Gynecology, Wuhan Medical and Health Center for Women and Children, Wuhan, 430022, China
| | - Bing Mao
- Department of Neurology, Wuhan Medical and Health Center for Women and Children, Wuhan, 430022, China
| | - Yun Yang
- BGI-Wuhan, Wuhan, 430075, China
- BGI-Shenzhen, Shenzhen, 518083, China
- * E-mail:
| |
Collapse
|
37
|
Ji T, Chen J. Modeling the next generation sequencing read count data for DNA copy number variant study. Stat Appl Genet Mol Biol 2015; 14:361-74. [PMID: 26140731 DOI: 10.1515/sagmb-2014-0054] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
As one of the most recent advanced technologies developed for biomedical research, the next generation sequencing (NGS) technology has opened more opportunities for scientific discovery of genetic information. The NGS technology is particularly useful in elucidating a genome for the analysis of DNA copy number variants (CNVs). The study of CNVs is important as many genetic studies have led to the conclusion that cancer development, genetic disorders, and other diseases are usually relevant to CNVs on the genome. One way to analyze the NGS data for detecting boundaries of CNV regions on a chromosome or a genome is to phrase the problem as a statistical change point detection problem presented in the read count data. We therefore provide a statistical change point model to help detect CNVs using the NGS read count data. We use a Bayesian approach to incorporate possible parameter changes in the underlying distribution of the NGS read count data. Posterior probabilities for the change point inferences are derived. Extensive simulation studies have shown advantages of our proposed methods. The proposed methods are also applied to a publicly available lung cancer cell line NGS dataset, and CNV regions on this cell line are successfully identified.
Collapse
|
38
|
Yiğiter A, Chen J, An L, Danacioğlu N. An online copy number variant detection method for short sequencing reads. J Appl Stat 2015. [DOI: 10.1080/02664763.2014.1001330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
39
|
Tattini L, D'Aurizio R, Magi A. Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Front Bioeng Biotechnol 2015; 3:92. [PMID: 26161383 PMCID: PMC4479793 DOI: 10.3389/fbioe.2015.00092] [Citation(s) in RCA: 169] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 06/10/2015] [Indexed: 01/16/2023] Open
Abstract
Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events.
Collapse
Affiliation(s)
- Lorenzo Tattini
- Department of Neurosciences, Psychology, Pharmacology and Child Health, University of Florence , Florence , Italy
| | - Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council , Pisa , Italy
| | - Alberto Magi
- Department of Clinical and Experimental Medicine, University of Florence , Florence , Italy
| |
Collapse
|
40
|
Manconi A, Manca E, Moscatelli M, Gnocchi M, Orro A, Armano G, Milanesi L. G-CNV: A GPU-Based Tool for Preparing Data to Detect CNVs with Read-Depth Methods. Front Bioeng Biotechnol 2015; 3:28. [PMID: 25806367 PMCID: PMC4354384 DOI: 10.3389/fbioe.2015.00028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 02/19/2015] [Indexed: 11/23/2022] Open
Abstract
Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals.
Collapse
Affiliation(s)
- Andrea Manconi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Emanuele Manca
- Department of Electrical and Electronic Engineering, University of Cagliari , Cagliari , Italy
| | - Marco Moscatelli
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Matteo Gnocchi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Alessandro Orro
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari , Cagliari , Italy
| | - Luciano Milanesi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| |
Collapse
|
41
|
Kuilman T, Velds A, Kemper K, Ranzani M, Bombardelli L, Hoogstraat M, Nevedomskaya E, Xu G, de Ruiter J, Lolkema MP, Ylstra B, Jonkers J, Rottenberg S, Wessels LF, Adams DJ, Peeper DS, Krijgsman O. CopywriteR: DNA copy number detection from off-target sequence data. Genome Biol 2015; 16:49. [PMID: 25887352 PMCID: PMC4396974 DOI: 10.1186/s13059-015-0617-1] [Citation(s) in RCA: 160] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2015] [Accepted: 02/20/2015] [Indexed: 12/13/2022] Open
Abstract
Current methods for detection of copy number variants (CNV) and aberrations (CNA) from targeted sequencing data are based on the depth of coverage of captured exons. Accurate CNA determination is complicated by uneven genomic distribution and non-uniform capture efficiency of targeted exons. Here we present CopywriteR, which eludes these problems by exploiting 'off-target' sequence reads. CopywriteR allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. CopywriteR outperforms existing methods and constitutes a widely applicable alternative to available tools.
Collapse
Affiliation(s)
- Thomas Kuilman
- Division of Molecular Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| | - Arno Velds
- Central Genomic Facility, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Kristel Kemper
- Division of Molecular Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| | - Marco Ranzani
- Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, UK.
| | - Lorenzo Bombardelli
- Division of Molecular Genetics, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Marlous Hoogstraat
- Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Ekaterina Nevedomskaya
- Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands.
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Guotai Xu
- Division of Molecular Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| | - Julian de Ruiter
- Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands.
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Martijn P Lolkema
- Center for Personalized Cancer Treatment, Amsterdam, The Netherlands.
| | - Bauke Ylstra
- Department of Pathology, VU University Medical Center, Amsterdam, The Netherlands.
| | - Jos Jonkers
- Division of Molecular Pathology, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Sven Rottenberg
- Division of Molecular Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
- Vetsuisse Faculty, Institute of Animal Pathology, University of Bern, Bern, Switzerland.
| | - Lodewyk F Wessels
- Division of Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - David J Adams
- Experimental Cancer Genetics, Wellcome Trust Sanger Institute, Hinxton, UK.
| | - Daniel S Peeper
- Division of Molecular Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| | - Oscar Krijgsman
- Division of Molecular Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| |
Collapse
|
42
|
Brynildsrud O, Snipen LG, Bohlin J. CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data. Bioinformatics 2015; 31:1708-15. [DOI: 10.1093/bioinformatics/btv070] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 01/28/2015] [Indexed: 01/22/2023] Open
|
43
|
Magi A, Tattini L, Cifola I, D'Aurizio R, Benelli M, Mangano E, Battaglia C, Bonora E, Kurg A, Seri M, Magini P, Giusti B, Romeo G, Pippucci T, De Bellis G, Abbate R, Gensini GF. EXCAVATOR: detecting copy number variants from whole-exome sequencing data. Genome Biol 2014; 14:R120. [PMID: 24172663 PMCID: PMC4053953 DOI: 10.1186/gb-2013-14-10-r120] [Citation(s) in RCA: 188] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Accepted: 10/30/2013] [Indexed: 12/11/2022] Open
Abstract
We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/.
Collapse
|
44
|
Huang S, Holt J, Kao CY, McMillan L, Wang W. A novel multi-alignment pipeline for high-throughput sequencing data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau057. [PMID: 24948510 PMCID: PMC4062837 DOI: 10.1093/database/bau057] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo.
Collapse
Affiliation(s)
- Shunping Huang
- Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| | - James Holt
- Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| | - Chia-Yu Kao
- Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| | - Leonard McMillan
- Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| | - Wei Wang
- Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
45
|
Mosen-Ansorena D, Telleria N, Veganzones S, De la Orden V, Maestro ML, Aransay AM. seqCNA: an R package for DNA copy number analysis in cancer using high-throughput sequencing. BMC Genomics 2014; 15:178. [PMID: 24597965 PMCID: PMC4022175 DOI: 10.1186/1471-2164-15-178] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 02/26/2014] [Indexed: 11/25/2022] Open
Abstract
Background Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations, are structural rearrangements that can critically affect gene expression patterns. Additionally, copy number alteration profiles allow insight into cancer discrimination, progression and complexity. On data obtained from high-throughput sequencing, improving quality through GC bias correction and keeping false positives to a minimum help build reliable copy number alteration profiles. Results We introduce seqCNA, a parallelized R package for an integral copy number analysis of high-throughput sequencing cancer data. The package includes novel methodology on (i) filtering, reducing false positives, and (ii) GC content correction, improving copy number profile quality, especially under great read coverage and high correlation between GC content and copy number. Adequate analysis steps are automatically chosen based on availability of paired-end mapping, matched normal samples and genome annotation. Conclusions seqCNA, available through Bioconductor, provides accurate copy number predictions in tumoural data, thanks to the extensive filtering and better GC bias correction, while providing an integrated and parallelized workflow. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-178) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- David Mosen-Ansorena
- CIC bioGUNE & CIBERehd, Technologic Park of Bizkaia, Building 502, 48160 Derio, Spain.
| | | | | | | | | | | |
Collapse
|
46
|
Li X, Chen S, Xie W, Vogel I, Choy KW, Chen F, Christensen R, Zhang C, Ge H, Jiang H, Yu C, Huang F, Wang W, Jiang H, Zhang X. PSCC: sensitive and reliable population-scale copy number variation detection method based on low coverage sequencing. PLoS One 2014; 9:e85096. [PMID: 24465483 PMCID: PMC3897425 DOI: 10.1371/journal.pone.0085096] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 11/22/2013] [Indexed: 11/28/2022] Open
Abstract
Background Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method. Methodology/Principal Findings In our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS. Conclusions/Significance Our study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing.
Collapse
Affiliation(s)
| | - Shengpei Chen
- BGI-Shenzhen, Shenzhen, China ; State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | | | - Ida Vogel
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
| | - Kwong Wai Choy
- Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
| | | | - Rikke Christensen
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
| | | | | | - Haojun Jiang
- BGI-Shenzhen, Shenzhen, China ; State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | | | - Fang Huang
- Guangzhou Children's Social Welfare Home, Guangzhou, China
| | - Wei Wang
- BGI-Shenzhen, Shenzhen, China ; Clinical laboratory of BGI Health, Shenzhen, China
| | | | - Xiuqing Zhang
- BGI-Shenzhen, Shenzhen, China ; The Guangdong Enterprise Key Laboratory of Human Disease Genomics, BGI-Shenzhen, Shenzhen, China
| |
Collapse
|
47
|
Duitama J, Quintero JC, Cruz DF, Quintero C, Hubmann G, Foulquié-Moreno MR, Verstrepen KJ, Thevelein JM, Tohme J. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments. Nucleic Acids Res 2014; 42:e44. [PMID: 24413664 PMCID: PMC3973327 DOI: 10.1093/nar/gkt1381] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.
Collapse
Affiliation(s)
- Jorge Duitama
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
- *To whom correspondence should be addressed. Tel: +57 2 4450000; Fax: +57 2 4450073;
| | - Juan Camilo Quintero
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Daniel Felipe Cruz
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Constanza Quintero
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Georg Hubmann
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Maria R. Foulquié-Moreno
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Kevin J. Verstrepen
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Johan M. Thevelein
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Joe Tohme
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| |
Collapse
|
48
|
Nakachi I, Rice JL, Coldren CD, Edwards MG, Stearman RS, Glidewell SC, Varella-Garcia M, Franklin WA, Keith RL, Lewis MT, Gao B, Merrick DT, Miller YE, Geraci MW. Application of SNP microarrays to the genome-wide analysis of chromosomal instability in premalignant airway lesions. Cancer Prev Res (Phila) 2013; 7:255-65. [PMID: 24346345 DOI: 10.1158/1940-6207.capr-12-0485] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Chromosomal instability is central to the process of carcinogenesis. The genome-wide detection of somatic chromosomal alterations (SCA) in small premalignant lesions remains challenging because sample heterogeneity dilutes the aberrant cell information. To overcome this hurdle, we focused on the B allele frequency data from single-nucleotide polymorphism microarrays (SNP arrays). The difference of allelic fractions between paired tumor and normal samples from the same patient (delta-θ) provides a simple but sensitive detection of SCA in the affected tissue. We applied the delta-θ approach to small, heterogeneous clinical specimens, including endobronchial biopsies and brushings. Regions identified by delta-θ were validated by FISH and quantitative PCR in heterogeneous samples. Distinctive genomic variations were successfully detected across the whole genome in all invasive cancer cases (6 of 6), carcinoma in situ (3 of 3), and high-grade dysplasia (severe or moderate; 3 of 11). Not only well-described SCAs in lung squamous cell carcinoma, but also several novel chromosomal alterations were frequently found across the preinvasive dysplastic cases. Within these novel regions, losses of putative tumor suppressors (RNF20 and SSBP2) and an amplification of RASGRP3 gene with oncogenic activity were observed. Widespread sampling of the airway during bronchoscopy demonstrated that field cancerization reflected by SCAs at multiple sites was detectable. SNP arrays combined with delta-θ analysis can detect SCAs in heterogeneous clinical sample and expand our ability to assess genomic instability in the airway epithelium as a biomarker of lung cancer risk.
Collapse
Affiliation(s)
- Ichiro Nakachi
- University of Colorado, Anschutz Medical Campus, 12700, East 19th Avenue, RC2 9th Floor, Aurora, CO 80045.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Abstract
MOTIVATION Data quality is a critical issue in the analyses of DNA copy number alterations obtained from microarrays. It is commonly assumed that copy number alteration data can be modeled as piecewise constant and the measurement errors of different probes are independent. However, these assumptions do not always hold in practice. In some published datasets, we find that measurement errors are highly correlated between probes that interrogate nearby genomic loci, and the piecewise-constant model does not fit the data well. The correlated errors cause problems in downstream analysis, leading to a large number of DNA segments falsely identified as having copy number gains and losses. METHOD We developed a simple tool, called autocorrelation scanning profile, to assess the dependence of measurement error between neighboring probes. RESULTS Autocorrelation scanning profile can be used to check data quality and refine the analysis of DNA copy number data, which we demonstrate in some typical datasets. CONTACT lzhangli@mdanderson.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liangcai Zhang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA and Department of Biophysics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | | |
Collapse
|
50
|
Zhao M, Wang Q, Wang Q, Jia P, Zhao Z. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics 2013; 14 Suppl 11:S1. [PMID: 24564169 PMCID: PMC3846878 DOI: 10.1186/1471-2105-14-s11-s1] [Citation(s) in RCA: 350] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.
Collapse
|