1
|
Nishio S, Shirasawa K, Nishimura R, Takeuchi Y, Imai A, Mase N, Takada N. A self-compatible pear mutant derived from γ-irradiated pollen carries an 11-Mb duplication in chromosome 17. Front Plant Sci 2024; 15:1360185. [PMID: 38504898 PMCID: PMC10948449 DOI: 10.3389/fpls.2024.1360185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 02/13/2024] [Indexed: 03/21/2024]
Abstract
Self-compatibility is a highly desirable trait for pear breeding programs. Our breeding program previously developed a novel self-compatible pollen-part Japanese pear mutant (Pyrus pyrifolia Nakai), '415-1', by using γ-irradiated pollen. '415-1' carries the S-genotype S4dS5S5, with "d" indicating a duplication of S 5 responsible for breakdown of self-incompatibility. Until now, the size and inheritance of the duplicated segment was undetermined, and a reliable detection method was lacking. Here, we examined genome duplications and their inheritance in 140 F1 seedlings resulting from a cross between '515-20' (S1S3) and '415-1'. Amplicon sequencing of S-RNase and SFBB18 clearly detected S-haplotype duplications in the seedlings. Intriguingly, 30 partially triploid seedlings including genotypes S1S4dS5, S3S4dS5, S1S5dS5, S3S5dS5, and S3S4dS4 were detected among the 140 seedlings. Depth-of-coverage analysis using ddRAD-seq showed that the duplications in those individuals were limited to chromosome 17. Further analysis through resequencing confirmed an 11-Mb chromosome duplication spanning the middle to the end of chromosome 17. The duplicated segment remained consistent in size across generations. The presence of an S3S4dS4 seedling provided evidence for recombination between the duplicated S5 segment and the original S4haplotype, suggesting that the duplicated segment can pair with other parts of chromosome 17. This research provides valuable insights for improving pear breeding programs using partially triploid individuals.
Collapse
Affiliation(s)
- Sogo Nishio
- Deciduous Fruit Tree Breeding Group, Division of Fruit Tree Breeding Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Tsukuba, Japan
| | - Kenta Shirasawa
- Department of Frontier Research and Development, Kazusa DNA Research Institute, Kisarazu, Japan
| | - Ryotaro Nishimura
- Fruit Tree Smart Production Group, Division of Fruit Tree Production Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Higashihiroshima, Japan
| | - Yukie Takeuchi
- Deciduous Fruit Tree Breeding Group, Division of Fruit Tree Breeding Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Tsukuba, Japan
| | - Atsushi Imai
- Deciduous Fruit Tree Breeding Group, Division of Fruit Tree Breeding Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Tsukuba, Japan
| | - Nobuko Mase
- Citrus Breeding and Production Group, Division of Citrus Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Shizuoka, Japan
| | - Norio Takada
- Deciduous Fruit Tree Breeding Group, Division of Fruit Tree Breeding Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Tsukuba, Japan
| |
Collapse
|
2
|
Truty R, Rojahn S, Ouyang K, Kautzer C, Kennemer M, Pineda-Alvarez D, Johnson B, Stafford A, Basel-Salmon L, Saitta S, Slavotinek A, Chandrasekharappa SC, Suarez CJ, Burnett L, Nussbaum RL, Aradhya S. Patterns of mosaicism for sequence and copy-number variants discovered through clinical deep sequencing of disease-related genes in one million individuals. Am J Hum Genet 2023; 110:551-564. [PMID: 36933558 PMCID: PMC10119133 DOI: 10.1016/j.ajhg.2023.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 02/23/2023] [Indexed: 03/19/2023] Open
Abstract
DNA variants that arise after conception can show mosaicism, varying in presence and extent among tissues. Mosaic variants have been reported in Mendelian diseases, but further investigation is necessary to broadly understand their incidence, transmission, and clinical impact. A mosaic pathogenic variant in a disease-related gene may cause an atypical phenotype in terms of severity, clinical features, or timing of disease onset. Using high-depth sequencing, we studied results from one million unrelated individuals referred for genetic testing for almost 1,900 disease-related genes. We observed 5,939 mosaic sequence or intragenic copy number variants distributed across 509 genes in nearly 5,700 individuals, constituting approximately 2% of molecular diagnoses in the cohort. Cancer-related genes had the most mosaic variants and showed age-specific enrichment, in part reflecting clonal hematopoiesis in older individuals. We also observed many mosaic variants in genes related to early-onset conditions. Additional mosaic variants were observed in genes analyzed for reproductive carrier screening or associated with dominant disorders with low penetrance, posing challenges for interpreting their clinical significance. When we controlled for the potential involvement of clonal hematopoiesis, most mosaic variants were enriched in younger individuals and were present at higher levels than in older individuals. Furthermore, individuals with mosaicism showed later disease onset or milder phenotypes than individuals with non-mosaic variants in the same genes. Collectively, the large compendium of variants, disease correlations, and age-specific results identified in this study expand our understanding of the implications of mosaic DNA variation for diagnosis and genetic counseling.
Collapse
Affiliation(s)
- Rebecca Truty
- Invitae, 1400 16th Street, San Francisco, CA 94103, USA
| | - Susan Rojahn
- Invitae, 1400 16th Street, San Francisco, CA 94103, USA
| | - Karen Ouyang
- Invitae, 1400 16th Street, San Francisco, CA 94103, USA
| | | | | | | | - Britt Johnson
- Invitae, 1400 16th Street, San Francisco, CA 94103, USA
| | | | - Lina Basel-Salmon
- Rabin Medical Center-Beilinson Hospital and Schneider Children's Medical Center of Israel, Petach Tikva, Israel; Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel; Felsenstein Medical Research Center, Petach Tikva, Israel
| | - Sulagna Saitta
- Division of Clinical Genetics, Departments of Pediatrics and Obstetrics and Gynecology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Anne Slavotinek
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Settara C Chandrasekharappa
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Carlos Jose Suarez
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94301, USA
| | | | - Robert L Nussbaum
- Invitae, 1400 16th Street, San Francisco, CA 94103, USA; School of Medicine, University of California - San Francisco, San Francisco, CA, USA
| | - Swaroop Aradhya
- Invitae, 1400 16th Street, San Francisco, CA 94103, USA; Department of Pathology, Stanford University School of Medicine, Stanford, CA 94301, USA.
| |
Collapse
|
3
|
Thuesen NH, Klausen MS, Gopalakrishnan S, Trolle T, Renaud G. Benchmarking freely available HLA typing algorithms across varying genes, coverages and typing resolutions. Front Immunol 2022; 13:987655. [PMID: 36426357 PMCID: PMC9679531 DOI: 10.3389/fimmu.2022.987655] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 10/10/2022] [Indexed: 11/02/2023] Open
Abstract
Identifying the specific human leukocyte antigen (HLA) allele combination of an individual is crucial in organ donation, risk assessment of autoimmune and infectious diseases and cancer immunotherapy. However, due to the high genetic polymorphism in this region, HLA typing requires specialized methods. We investigated the performance of five next-generation sequencing (NGS) based HLA typing tools with a non-restricted license namely HLA*LA, Optitype, HISAT-genotype, Kourami and STC-Seq. This evaluation was done for the five HLA loci, HLA-A, -B, -C, -DRB1 and -DQB1 using whole-exome sequencing (WES) samples from 829 individuals. The robustness of the tools to lower depth of coverage (DOC) was evaluated by subsampling and HLA typing 230 WES samples at DOC ranging from 1X to 100X. The HLA typing accuracy was measured across four typing resolutions. Among these, we present two clinically-relevant typing resolutions (P group and pseudo-sequence), which specifically focus on the peptide binding region. On average, across the five HLA loci examined, HLA*LA was found to have the highest typing accuracy. For the individual loci, HLA-A, -B and -C, Optitype's typing accuracy was the highest and HLA*LA had the highest typing accuracy for HLA-DRB1 and -DQB1. The tools' robustness to lower DOC data varied widely and further depended on the specific HLA locus. For all Class I loci, Optitype had a typing accuracy above 95% (according to the modification of the amino acids in the functionally relevant portion of the HLA molecule) at 50X, but increasing the DOC beyond even 100X could still improve the typing accuracy of HISAT-genotype, Kourami, and STC-seq across all five HLA loci as well as HLA*LA's typing accuracy for HLA-DQB1. HLA typing is also used in studies of ancient DNA (aDNA), which is often based on sequencing data with lower quality and DOC. Interestingly, we found that Optitype's typing accuracy is not notably impaired by short read length or by DNA damage, which is typical of aDNA, as long as the DOC is sufficiently high.
Collapse
Affiliation(s)
- Nikolas Hallberg Thuesen
- Evaxion Biotech, Copenhagen, Denmark
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | | | - Shyam Gopalakrishnan
- Section for Hologenomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Gabriel Renaud
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
4
|
Benjelloun B, Boyer F, Streeter I, Zamani W, Engelen S, Alberti A, Alberto FJ, BenBati M, Ibnelbachyr M, Chentouf M, Bechchari A, Rezaei HR, Naderi S, Stella A, Chikhi A, Clarke L, Kijas J, Flicek P, Taberlet P, Pompanon F. An evaluation of sequencing coverage and genotyping strategies to assess neutral and adaptive diversity. Mol Ecol Resour 2019; 19:1497-1515. [PMID: 31359622 PMCID: PMC7115901 DOI: 10.1111/1755-0998.13070] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 06/30/2019] [Accepted: 07/08/2019] [Indexed: 12/12/2022]
Abstract
Whole genome sequences (WGS) greatly increase our ability to precisely infer population genetic parameters, demographic processes, and selection signatures. However, WGS may still be not affordable for a representative number of individuals/populations. In this context, our goal was to assess the efficiency of several SNP genotyping strategies by testing their ability to accurately estimate parameters describing neutral diversity and to detect signatures of selection. We analysed 110 WGS at 12× coverage for four different species, i.e., sheep, goats and their wild counterparts. From these data we generated 946 data sets corresponding to random panels of 1K to 5M variants, commercial SNP chips and exome capture, for sample sizes of five to 48 individuals. We also extracted low-coverage genome resequencing of 1×, 2× and 5× by randomly subsampling reads from the 12× resequencing data. Globally, 5K to 10K random variants were enough for an accurate estimation of genome diversity. Conversely, commercial panels and exome capture displayed strong ascertainment biases. Besides the characterization of neutral diversity, the detection of the signature of selection and the accurate estimation of linkage disequilibrium (LD) required high-density panels of at least 1M variants. Finally, genotype likelihoods increased the quality of variant calling from low coverage resequencing but proportions of incorrect genotypes remained substantial, especially for heterozygote sites. Whole genome resequencing coverage of at least 5× appeared to be necessary for accurate assessment of genomic variations. These results have implications for studies seeking to deploy low-density SNP collections or genome scans across genetically diverse populations/species showing similar genetic characteristics and patterns of LD decay for a wide variety of purposes.
Collapse
Affiliation(s)
- Badr Benjelloun
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
- National Institute of Agronomic Research (INRA Maroc), Regional Centre of Agronomic Research, 23000 Beni-Mellal, Morocco
| | - Frédéric Boyer
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - Ian Streeter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Wahid Zamani
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
- Department of Environmental Sciences, Faculty of Natural Resources and Marine Sciences, Tarbiat Modares University, 46417-76489 Noor, Mazandaran, Iran
| | - Stefan Engelen
- CEA - Institut de biologie François-Jacob, Genoscope, 2 Rue Gaston Cremieux 91057 Evry Cedex, France
| | - Adriana Alberti
- CEA - Institut de biologie François-Jacob, Genoscope, 2 Rue Gaston Cremieux 91057 Evry Cedex, France
| | - Florian J. Alberto
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - Mohamed BenBati
- National Institute of Agronomic Research (INRA Maroc), Regional Centre of Agronomic Research, 23000 Beni-Mellal, Morocco
| | - Mustapha Ibnelbachyr
- National Institute of Agronomic Research (INRA Maroc), CRRA Errachidia, 52000 Errachidia, Morocco
| | - Mouad Chentouf
- National Institute of Agronomic Research (INRA Maroc), CRRA Tangier, 90010 Tangier, Morocco
| | - Abdelmajid Bechchari
- National Institute of Agronomic Research (INRA Maroc), CRRA Oujda, 60000 Oujda, Morocco
| | - Hamid R. Rezaei
- Department of Environmental Sci, Gorgan University of Agricultural Sciences & Natural Resources, 41996-13776 Gorgan, Iran
| | - Saeid Naderi
- Environmental Sciences Department, Natural Resources Faculty, University of Guilan, 49138-15749 Guilan, Iran
| | - Alessandra Stella
- PTP Science Park, Bioinformatics Unit, Via Einstein-Loc. Cascina Codazza, 26900 Lodi, Italy
| | - Abdelkader Chikhi
- National Institute of Agronomic Research (INRA Maroc), CRRA Errachidia, 52000 Errachidia, Morocco
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - James Kijas
- Commonwealth Scientific and Industrial Research Organisation Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Pierre Taberlet
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - François Pompanon
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| |
Collapse
|
5
|
Wiewiórka M, Szmurło A, Kuśmirek W, Gambin T. SeQuiLa-cov: A fast and scalable library for depth of coverage calculations. Gigascience 2019; 8:giz094. [PMID: 31378808 PMCID: PMC6680061 DOI: 10.1093/gigascience/giz094] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 05/24/2019] [Accepted: 07/10/2019] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Depth of coverage calculation is an important and computationally intensive preprocessing step in a variety of next-generation sequencing pipelines, including the analysis of RNA-sequencing data, detection of copy number variants, or quality control procedures. RESULTS Building upon big data technologies, we have developed SeQuiLa-cov, an extension to the recently released SeQuiLa platform, which provides efficient depth of coverage calculations, reaching >100× speedup over the state-of-the-art tools. The performance and scalability of our solution allow for exome and genome-wide calculations running locally or on a cluster while hiding the complexity of the distributed computing with Structured Query Language Application Programming Interface. CONCLUSIONS SeQuiLa-cov provides significant performance gain in depth of coverage calculations streamlining the widely used bioinformatic processing pipelines.
Collapse
Affiliation(s)
- Marek Wiewiórka
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
| | - Agnieszka Szmurło
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
| | - Wiktor Kuśmirek
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
| | - Tomasz Gambin
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
| |
Collapse
|
6
|
Kong SW, Lee IH, Liu X, Hirschhorn JN, Mandl KD. Measuring coverage and accuracy of whole-exome sequencing in clinical context. Genet Med 2018; 20:1617-1626. [PMID: 29789557 PMCID: PMC6185824 DOI: 10.1038/gim.2018.51] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 02/16/2018] [Indexed: 01/21/2023] Open
Abstract
PURPOSE To evaluate the coverage and accuracy of whole-exome sequencing (WES) across vendors. METHODS Blood samples from three trios underwent WES at three vendors. Relative performance of the three WES services was measured for breadth and depth of coverage. The false-negative rates (FNRs) were estimated using the segregation pattern within each trio. RESULTS Mean depth of coverage for all genes was 189.0, 124.9, and 38.3 for the three vendor services. Fifty-five of the American College of Medical Genetics and Genomics 56 genes, but only 56 of 63 pharmacogenes, were 100% covered at 10 × in at least one of the nine individuals for all vendors; however, there was substantial interindividual variability. For the two vendors with mean depth of coverage >120 ×, analytic positive predictive values (aPPVs) exceeded 99.1% for single-nucleotide variants and homozygous indels, and sensitivities were 98.9-99.9%; however, heterozygous indels showed lower accuracy and sensitivity. Among the trios, FNRs in the offspring were 0.07-0.62% at well-covered variants concordantly called in both parents. CONCLUSION The current standard of 120 × coverage for clinical WES may be insufficient for consistent breadth of coverage across the exome. Ordering clinicians and researchers would benefit from vendors' reports that estimate sensitivity and aPPV, including depth of coverage across the exome.
Collapse
Affiliation(s)
- Sek Won Kong
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115, USA,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA,To whom correspondence should be addressed at: Sek Won Kong, MD, 300 Longwood Avenue, Boston Children’s Hospital, Boston, MA 02115, T: 617-919-2689, F: 617-730-0817,
| | - In-Hee Lee
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115, USA,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA
| | - Xuanshi Liu
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115, USA,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA
| | - Joel N. Hirschhorn
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA,Broad Institute, Cambridge, MA 02142, USA
| | - Kenneth D. Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115, USA,Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA,Department of Biomedical Informatics, Harvard Medical School, Boson, MA 02115, USA
| |
Collapse
|
7
|
Mahamdallie S, Ruark E, Yost S, Münz M, Renwick A, Poyastro-Pearson E, Strydom A, Seal S, Rahman N. The Quality Sequencing Minimum (QSM): providing comprehensive, consistent, transparent next generation sequencing data quality assurance. Wellcome Open Res 2018; 3:37. [PMID: 29992192 PMCID: PMC6020721 DOI: 10.12688/wellcomeopenres.14307.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2018] [Indexed: 11/20/2022] Open
Abstract
Next generation sequencing (NGS) is routinely used in clinical genetic testing. Quality management of NGS testing is essential to ensure performance is consistently and rigorously evaluated. Three primary metrics are used in NGS quality evaluation: depth of coverage, base quality and mapping quality. To provide consistency and transparency in the utilisation of these metrics we present the Quality Sequencing Minimum (QSM). The QSM defines the minimum quality requirement a laboratory has selected for depth of coverage (C), base quality (B) and mapping quality (M) and can be applied per base, exon, gene or other genomic region, as appropriate. The QSM format is CX_BY(P
Y)_MZ(P
Z). X is the parameter threshold for C, Y the parameter threshold for B, P
Y the percentage of reads that must reach Y, Z the parameter threshold for M, P
Z the percentage of reads that must reach Z. The data underlying the QSM is in the BAM file, so a QSM can be easily and automatically calculated in any NGS pipeline. We used the QSM to optimise cancer predisposition gene testing using the TruSight Cancer Panel (TSCP). We set the QSM as C50_B10(85)_M20(95). Test regions falling below the QSM were automatically flagged for review, with 100/1471 test regions QSM-flagged in multiple individuals. Supplementing these regions with 132 additional probes improved performance in 85/100. We also used the QSM to optimise testing of genes with pseudogenes such as
PTEN and
PMS2. In TSCP data from 960 individuals the median number of regions that passed QSM per sample was 1429 (97%). Importantly, the QSM can be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance. We believe many laboratories would find the QSM useful. Furthermore, widespread adoption of the QSM would facilitate consistent, transparent reporting of genetic test performance by different laboratories.
Collapse
Affiliation(s)
- Shazia Mahamdallie
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Elise Ruark
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Shawn Yost
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Márton Münz
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK
| | - Anthony Renwick
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK
| | - Emma Poyastro-Pearson
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Ann Strydom
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Sheila Seal
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Nazneen Rahman
- Division of Genetics & Epidemiology , The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK.,Cancer Genetics Unit, Royal Marsden NHS Foundation Trust, London, SM2 5PT, UK
| |
Collapse
|
8
|
Münz M, Mahamdallie S, Yost S, Rimmer A, Poyastro-Pearson E, Strydom A, Seal S, Ruark E, Rahman N. CoverView: a sequence quality evaluation tool for next generation sequencing data. Wellcome Open Res 2018; 3:36. [PMID: 29881786 PMCID: PMC5964631 DOI: 10.12688/wellcomeopenres.14306.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2018] [Indexed: 01/05/2023] Open
Abstract
Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at
github.com/RahmanTeamDevelopment/CoverView/releases and
www.icr.ac.uk/CoverView
Collapse
Affiliation(s)
- Márton Münz
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Shazia Mahamdallie
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Shawn Yost
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Andrew Rimmer
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Emma Poyastro-Pearson
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Ann Strydom
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Sheila Seal
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Elise Ruark
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Nazneen Rahman
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK.,TGLclinical, The Institute of Cancer Research, London, SM2 5NG, UK.,Cancer Genetics Unit, Royal Marsden NHS Foundation Trust, London, SM2 5PT, UK
| |
Collapse
|
9
|
Verdu CF, Guichoux E, Quevauvillers S, De Thier O, Laizet Y, Delcamp A, Gévaudant F, Monty A, Porté AJ, Lejeune P, Lassois L, Mariette S. Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in Robinia pseudoacacia L. Ecol Evol 2016; 6:7323-7333. [PMID: 28725400 PMCID: PMC5513258 DOI: 10.1002/ece3.2466] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Revised: 08/18/2016] [Accepted: 08/19/2016] [Indexed: 12/20/2022] Open
Abstract
The RADseq technology allows researchers to efficiently develop thousands of polymorphic loci across multiple individuals with little or no prior information on the genome. However, many questions remain about the biases inherent to this technology. Notably, sequence misalignments arising from paralogy may affect the development of single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity. We evaluated the impact of putative paralog loci on genetic diversity estimation during the development of SNPs from a RADseq dataset for the nonmodel tree species Robinia pseudoacacia L. We sequenced nine genotypes and analyzed the frequency of putative paralogous RAD loci as a function of both the depth of coverage and the mismatch threshold allowed between loci. Putative paralogy was detected in a very variable number of loci, from 1% to more than 20%, with the depth of coverage having a major influence on the result. Putative paralogy artificially increased the observed degree of polymorphism and resulting estimates of diversity. The choice of the depth of coverage also affected diversity estimation and SNP validation: A low threshold decreased the chances of detecting minor alleles while a high threshold increased allelic dropout. SNP validation was better for the low threshold (4×) than for the high threshold (18×) we tested. Using the strategy developed here, we were able to validate more than 80% of the SNPs tested by means of individual genotyping, resulting in a readily usable set of 330 SNPs, suitable for use in population genetics applications.
Collapse
Affiliation(s)
- Cindy F Verdu
- Forest Management Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium
| | | | - Samuel Quevauvillers
- Forest Management Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium
| | - Olivier De Thier
- Forest Management Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium
| | | | | | | | - Arnaud Monty
- Biodiversity and Landscape Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium
| | | | - Philippe Lejeune
- Forest Management Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium
| | - Ludivine Lassois
- Forest Management Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium.,Biodiversity and Landscape Unit Gembloux Agro-Bio Tech University of Liège Gembloux Belgium
| | | |
Collapse
|
10
|
Glusman G, Severson A, Dhankani V, Robinson M, Farrah T, Mauldin DE, Stittrich AB, Ament SA, Roach JC, Brunkow ME, Bodian DL, Vockley JG, Shmulevich I, Niederhuber JE, Hood L. Identification of copy number variants in whole-genome data using Reference Coverage Profiles. Front Genet 2015; 6:45. [PMID: 25741365 PMCID: PMC4330915 DOI: 10.3389/fgene.2015.00045] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 01/30/2015] [Indexed: 12/20/2022] Open
Abstract
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Joseph G Vockley
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | | | - John E Niederhuber
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Leroy Hood
- Institute for Systems Biology Seattle, WA, USA
| |
Collapse
|
11
|
Thung DT, Beulen L, Hehir-Kwa J, Faas BH. Implementation of whole genome massively parallel sequencing for noninvasive prenatal testing in laboratories. Expert Rev Mol Diagn 2014; 15:111-24. [PMID: 25347354 DOI: 10.1586/14737159.2015.973857] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Noninvasive prenatal testing (NIPT) for fetal aneuploidies using cell-free fetal DNA in maternal plasma has revolutionized the field of prenatal care and methods using massively parallel sequencing are now being implemented almost worldwide. Substantial progress has been made from initially testing for (an)euploidies of chromosomes 13, 18 and 21, to testing for sex chromosome (an)euploidies, additional autosomal aneuploidies as well as partial deletions and duplications genome-wide. Although NIPT is associated with significantly reduced risks for the fetus in comparison to existing invasive prenatal diagnostic methods, it presents several implementation challenges. Here, we review key issues potentially influencing NIPT and illustrate them using both data from literature and in-house data.
Collapse
|
12
|
Kadalayil L, Rafiq S, Rose-Zerilli MJJ, Pengelly RJ, Parker H, Oscier D, Strefford JC, Tapper WJ, Gibson J, Ennis S, Collins A. Exome sequence read depth methods for identifying copy number changes. Brief Bioinform 2014; 16:380-92. [PMID: 25169955 DOI: 10.1093/bib/bbu027] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2014] [Accepted: 07/10/2014] [Indexed: 01/04/2023] Open
Abstract
Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.
Collapse
|