1
|
SNP Development in Penaeus vannamei via Next-Generation Sequencing and DNA Pool Sequencing. FISHES 2021. [DOI: 10.3390/fishes6030036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Next-generation sequencing and pool sequencing have been widely used in SNP (single-nucleotide polymorphism) detection and population genetics research; however, there are few reports on SNPs related to the growth of Penaeus vannamei. The purpose of this study was to call SNPs from rapid-growing (RG) and slow-growing (SG) individuals’ transcriptomes and use DNA pool sequencing to assess the reliability of SNPs. Two parameters were applied to detect SNPs. One parameter was the p-values generated using Fisher’s exact test, which were used to calculate the significance of allele frequency differences between RG and SG. The other one was the AFI (minor allele frequency imbalance), which was defined to highlight the fold changes in MAF (minor allele frequency) values between RG and SG. There were 216,015 hypothetical SNPs, which were obtained based on the transcriptome data. Finally, 104 high-quality SNPs and 96,819 low-quality SNPs were predicted. Then, 18 high-quality SNPs and 17 low-quality SNPs were selected to assess the reliability of the detection process. Here, 72.22% (13/18) accuracy was achieved for high-quality SNPs, while only 52.94% (9/17) accuracy was achieved for low-quality SNPs. These SNPs enrich the data for population genetics studies of P. vannamei and may play a role in the development of SNP markers for future breeding studies.
Collapse
|
2
|
Identification of SNPs in crucial starch biosynthesis genes in rice. J Genet 2021. [DOI: 10.1007/s12041-020-01251-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
3
|
A new approach based on targeted pooled DNA sequencing identifies novel mutations in patients with Inherited Retinal Dystrophies. Sci Rep 2018; 8:15457. [PMID: 30337596 PMCID: PMC6194132 DOI: 10.1038/s41598-018-33810-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 10/04/2018] [Indexed: 01/28/2023] Open
Abstract
Inherited retinal diseases (IRD) are a heterogeneous group of diseases that mainly affect the retina; more than 250 genes have been linked to the disease and more than 20 different clinical phenotypes have been described. This heterogeneity both at the clinical and genetic levels complicates the identification of causative mutations. Therefore, a detailed genetic characterization is important for genetic counselling and decisions regarding treatment. In this study, we developed a method consisting on pooled targeted next generation sequencing (NGS) that we applied to 316 eye disease related genes, followed by High Resolution Melting and copy number variation analysis. DNA from 115 unrelated test samples was pooled and samples with known mutations were used as positive controls to assess the sensitivity of our approach. Causal mutations for IRDs were found in 36 patients achieving a detection rate of 31.3%. Overall, 49 likely causative mutations were identified in characterized patients, 14 of which were first described in this study (28.6%). Our study shows that this new approach is a cost-effective tool for detection of causative mutations in patients with inherited retinopathies.
Collapse
|
4
|
Wang X, Sui W, Wu W, Hou X, Ou M, Xiang Y, Dai Y. Whole-genome resequencing of 100 healthy individuals using DNA pooling. Exp Ther Med 2016; 12:3143-3150. [PMID: 27882129 PMCID: PMC5103757 DOI: 10.3892/etm.2016.3797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Accepted: 08/11/2016] [Indexed: 12/27/2022] Open
Abstract
With the advent of next-generation sequencing technology, the cost of sequencing has significantly decreased. However, sequencing costs remain high for large-scale studies. In the present study, DNA pooling was applied as a cost-effective strategy for sequencing. The sequencing results for 100 healthy individuals obtained via whole-genome resequencing and using DNA pooling are presented in the present study. In order to minimise the likelihood of systematic bias in sampling, paired-end libraries with an insert size of 500 bp were prepared for all samples and then subjected to whole-genome sequencing using four lanes for each library and resulting in at least a 30-fold haploid coverage for each sample. The NCBI human genome build37 (hg19) was used as a reference genome for the present study and the short reads were aligned to the reference genome achieving 99.84% coverage. In addition, the average sequencing depth was 32.76. In total, ~3 million single-nucleotide polymorphisms were identified, of which 99.88% were in the NCBI dbSNP database. Furthermore, ~600,000 small insertion/deletions, 500,000 structure variants, 5,000 copy number variations and 13,000 single nucleotide variants were identified. According to the present study, the whole genome has been sequenced for a small sample subjects from southern China for the first time. Furthermore, new variation sites were identified by comparing with the reference sequence, and new knowledge of the human genome variation was added to the human genomic databases. Furthermore, the particular distribution regions of variation were illustrated by analyzing various sites of variation, such as single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Xiaobin Wang
- Health Management Centre, The Affiliated Guilin Hospital, Southern Medical University, Guilin, Guangxi 541000, P.R. China; Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China
| | - Weiguo Sui
- Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China
| | - Weiqing Wu
- Health Management Centre, The Second Clinical Medical College, Jinan University, Shenzhen, Guangdong 518001, P.R. China
| | - Xianliang Hou
- Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China; College of Life Science, Guangxi Normal University, Guilin, Guangxi 541001, P.R. China
| | - Minglin Ou
- Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China
| | - Yueying Xiang
- Health Management Centre, The Affiliated Guilin Hospital, Southern Medical University, Guilin, Guangxi 541000, P.R. China
| | - Yong Dai
- Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China; Clinical Medical Research Center, The Second Clinical Medical College, Jinan University, Shenzhen, Guangdong 518001, P.R. China
| |
Collapse
|
5
|
Lan F, Haliburton JR, Yuan A, Abate AR. Droplet barcoding for massively parallel single-molecule deep sequencing. Nat Commun 2016; 7:11784. [PMID: 27353563 PMCID: PMC4931254 DOI: 10.1038/ncomms11784] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 04/28/2016] [Indexed: 02/08/2023] Open
Abstract
The ability to accurately sequence long DNA molecules is important across biology, but existing sequencers are limited in read length and accuracy. Here, we demonstrate a method to leverage short-read sequencing to obtain long and accurate reads. Using droplet microfluidics, we isolate, amplify, fragment and barcode single DNA molecules in aqueous picolitre droplets, allowing the full-length molecules to be sequenced with multi-fold coverage using short-read sequencing. We show that this approach can provide accurate sequences of up to 10 kb, allowing us to identify rare mutations below the detection limit of conventional sequencing and directly link them into haplotypes. This barcoding methodology can be a powerful tool in sequencing heterogeneous populations such as viruses. The ability to accurately sequence long DNA molecules is important across biology. Here, Lan et al. report a droplet-based method that barcodes single DNA molecules, allowing the full-length molecules to be sequenced with multi-fold coverage using short-read next-generation sequencing.
Collapse
Affiliation(s)
- Freeman Lan
- Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, California 94158, USA.,UC Berkeley - UCSF Bioengineering Graduate program, University of California, San Francisco, California 94158, USA
| | - John R Haliburton
- Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, California 94158, USA.,Integrative Program in Quantitative Biology (iPQB) Biophysics Graduate program, University of California, San Francisco, California 94158, USA
| | - Aaron Yuan
- Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, California 94158, USA.,Department of Electrical Engineering and Computer Sciences (EECS), Computer Science Division (CS), University of California, Berkeley, California 94720, USA
| | - Adam R Abate
- Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, California 94158, USA.,UC Berkeley - UCSF Bioengineering Graduate program, University of California, San Francisco, California 94158, USA.,Integrative Program in Quantitative Biology (iPQB) Biophysics Graduate program, University of California, San Francisco, California 94158, USA
| |
Collapse
|
6
|
Hong SN, Park C, Park SJ, Lee CK, Ye BD, Kim YS, Lee S, Chae J, Kim JI, Kim YH. Deep resequencing of 131 Crohn's disease associated genes in pooled DNA confirmed three reported variants and identified eight novel variants. Gut 2016; 65:788-96. [PMID: 25731871 DOI: 10.1136/gutjnl-2014-308617] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 01/27/2015] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Genome wide association studies (GWAS) and meta-analyses for Crohn's disease (CD) have not fully explained the heritability of CD, suggesting that additional loci are yet to be found and that the known loci may contain high effect rare risk variants that have thus far gone undetected by GWAS. While the cost of deep sequencing remains too high to analyse many samples, targeted sequencing of pooled DNA samples allows the efficient and cost effective capture of all variations in a target region. DESIGN We performed pooled sequencing in 500 Korean CD cases and 1000 controls to evaluate the coding exon and 5' and 3' untranslated regions of 131 CD associated genes. The identified genetic variants were validated using genotyping in an independent set of 500 CD cases and 1000 controls. RESULTS Pooled sequencing identified 30 common/low single nucleotide variants (SNVs) in 12 genes and 3 rare SNVs in 3 genes. Our results confirmed a significant association of CD with the following previously reported risk loci: rs3810936 in TNFSF15 (OR=1.83, p<2.2×10(-16)), rs76418789 in IL23R (OR=0.47, p=1.14×10(-8)) and rs2241880 in ATG16L1 (OR=1.30, p=5.28×10(-6)). In addition, novel loci were identified in TNFSF8 (rs3181374, OR=1.53, p=1.03×10(-14)), BTNL2 (rs28362680, OR=1.47, p=9.67×10(-11)), HLA-DQA2 (rs3208181, OR=1.36, p=4.66×10(-6)), STAT3 (rs1053004, OR=1.29, p=2.07×10(-5)), NFKBIA (rs2273650, OR=0.80, p=3.93×10(-4)), NKX2-3 (rs888208, OR=0.82, p=6.37×10(-4)) and DNAH12 (rs4462937, OR=1.13, p=3.17×10(-2)). A novel rare SNV, rs200735402 in CARD9, was shown to have a protective effect (OR=0.09, p=5.28×10(-5)). CONCLUSIONS Our deep resequencing of 131 CD associated genes confirmed 3 reported risk loci and identified 8 novel risk loci for CD in Koreans, providing new insights into the genetic architecture of CD.
Collapse
Affiliation(s)
- Sung Noh Hong
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Changho Park
- Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, Korea Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, Korea
| | - Soo Jung Park
- Department of Internal Medicine and Institute of Gastroenterology, Yonsei University College of Medicine, Seoul, Korea
| | - Chang Kyun Lee
- Department of Internal Medicine, Kyung Hee University School of Medicine, Seoul, Korea
| | - Byong Duk Ye
- Department of Gastroenterology and Inflammatory Bowel Disease Center, Asan Medical Centre, University of Ulsan College of Medicine
| | - You Sun Kim
- Department of Internal Medicine, Seoul Paik Hospital, Inje University College of Medicine, Seoul, Korea
| | - Seungbok Lee
- Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, Korea Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, Korea Medical Research Center, Genomic Medicine Institute (GMI), Seoul National University, Seoul, Korea
| | - Jeesoo Chae
- Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, Korea Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, Korea
| | - Jong-Il Kim
- Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, Korea Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, Korea Medical Research Center, Genomic Medicine Institute (GMI), Seoul National University, Seoul, Korea Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea
| | - Young-Ho Kim
- Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | | |
Collapse
|
7
|
High-resolution melting (HRM) re-analysis of a polyposis patients cohort reveals previously undetected heterozygous and mosaic APC gene mutations. Fam Cancer 2016; 14:247-57. [PMID: 25604157 PMCID: PMC4430602 DOI: 10.1007/s10689-015-9780-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Familial adenomatous polyposis is most frequently caused by pathogenic variants in either the APC gene or the MUTYH gene. The detection rate of pathogenic variants depends on the severity of the phenotype and sensitivity of the screening method, including sensitivity for mosaic variants. For 171 patients with multiple colorectal polyps without previously detectable pathogenic variant, APC was reanalyzed in leukocyte DNA by one uniform technique: high-resolution melting (HRM) analysis. Serial dilution of heterozygous DNA resulted in a lowest detectable allelic fraction of 6 % for the majority of variants. HRM analysis and subsequent sequencing detected pathogenic fully heterozygous APC variants in 10 (6 %) of the patients and pathogenic mosaic variants in 2 (1 %). All these variants were previously missed by various conventional scanning methods. In parallel, HRM APC scanning was applied to DNA isolated from polyp tissue of two additional patients with apparently sporadic polyposis and without detectable pathogenic APC variant in leukocyte DNA. In both patients a pathogenic mosaic APC variant was present in multiple polyps. The detection of pathogenic APC variants in 7 % of the patients, including mosaics, illustrates the usefulness of a complete APC gene reanalysis of previously tested patients, by a supplementary scanning method. HRM is a sensitive and fast pre-screening method for reliable detection of heterozygous and mosaic variants, which can be applied to leukocyte and polyp derived DNA.
Collapse
|
8
|
Postula M, Janicki PK, Eyileten C, Rosiak M, Kaplon-Cieslicka A, Sugino S, Wilimski R, Kosior DA, Opolski G, Filipiak KJ, Mirowska-Guzel D. Next-generation re-sequencing of genes involved in increased platelet reactivity in diabetic patients on acetylsalicylic acid. Platelets 2015; 27:357-64. [PMID: 26599574 DOI: 10.3109/09537104.2015.1109071] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The objective of this study was to investigate whether rare missense genetic variants in several genes related to platelet functions and acetylsalicylic acid (ASA) response are associated with the platelet reactivity in patients with diabetes type 2 (T2D) on ASA therapy. Fifty eight exons and corresponding introns of eight selected genes, including PTGS1, PTGS2, TXBAS1, PTGIS, ADRA2A, ADRA2B, TXBA2R, and P2RY1 were re-sequenced in 230 DNA samples from T2D patients by using a pooled PCR amplification and next-generation sequencing by Illumina HiSeq2000. The observed non-synonymous variants were confirmed by individual genotyping of 384 DNA samples comprising of the individuals from the original discovery pools and additional verification cohort of 154 ASA-treated T2DM patients. The association between investigated phenotypes (ASA induced changes in platelets reactivity by PFA-100, VerifyNow and serum thromboxane B2 level [sTxB2]), and accumulation of rare missense variants (genetic burden) in investigated genes was tested using statistical collapsing tests. We identified a total of 35 exonic variants, including 3 common missense variants, 15 rare missense variants, and 17 synonymous variants in 8 investigated genes. The rare missense variants exhibited statistically significant difference in the accumulation pattern between a group of patients with increased and normal platelet reactivity based on PFA-100 assay. Our study suggests that genetic burden of the rare functional variants in eight genes may contribute to differences in the platelet reactivity measured with the PFA-100 assay in the T2DM patients treated with ASA.
Collapse
Affiliation(s)
- Marek Postula
- a Department of Experimental and Clinical Pharmacology , Medical University of Warsaw, Center for Preclinical Research and Technology CEPT , Warsaw , Poland.,b Perioperative Genomics Laboratory , Penn State College of Medicine , Hershey , PA , USA
| | - Piotr K Janicki
- b Perioperative Genomics Laboratory , Penn State College of Medicine , Hershey , PA , USA
| | - Ceren Eyileten
- a Department of Experimental and Clinical Pharmacology , Medical University of Warsaw, Center for Preclinical Research and Technology CEPT , Warsaw , Poland
| | - Marek Rosiak
- a Department of Experimental and Clinical Pharmacology , Medical University of Warsaw, Center for Preclinical Research and Technology CEPT , Warsaw , Poland.,c Department of Cardiology and Hypertension , Central Clinical Hospital, The Ministry of the Interior , Warsaw , Poland
| | | | - Shigekazu Sugino
- b Perioperative Genomics Laboratory , Penn State College of Medicine , Hershey , PA , USA
| | - Radosław Wilimski
- e Department of Cardiac Surgery , Medical University of Warsaw , Warsaw , Poland
| | - Dariusz A Kosior
- c Department of Cardiology and Hypertension , Central Clinical Hospital, The Ministry of the Interior , Warsaw , Poland.,f Department of Applied Physiology , Mossakowski Medical Research Centre, Polish Academy of Sciences , Warsaw , Poland
| | - Grzegorz Opolski
- d Department of Cardiology , Medical University of Warsaw , Warsaw , Poland
| | | | - Dagmara Mirowska-Guzel
- a Department of Experimental and Clinical Pharmacology , Medical University of Warsaw, Center for Preclinical Research and Technology CEPT , Warsaw , Poland
| |
Collapse
|
9
|
Dingemanse C, Belzer C, van Hijum SAFT, Günthel M, Salvatori D, den Dunnen JT, Kuijper EJ, Devilee P, de Vos WM, van Ommen GB, Robanus-Maandag EC. Akkermansia muciniphila and Helicobacter typhlonius modulate intestinal tumor development in mice. Carcinogenesis 2015; 36:1388-96. [PMID: 26320104 DOI: 10.1093/carcin/bgv120] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2015] [Accepted: 08/13/2015] [Indexed: 12/17/2022] Open
Abstract
Gastrointestinal tumor growth is thought to be promoted by gastrointestinal bacteria and their inflammatory products. We observed that intestine-specific conditional Apc mutant mice (FabplCre;Apc (15lox/+)) developed many more colorectal tumors under conventional than under pathogen-low housing conditions. Shotgun metagenomic sequencing plus quantitative PCR analysis of feces DNA revealed the presence of two bacterial species in conventional mice, absent from pathogen-low mice. One, Helicobacter typhlonius, has not been associated with cancer in man, nor in immune-competent mice. The other species, mucin-degrading Akkermansia muciniphila, is abundantly present in healthy humans, but reduced in patients with inflammatory gastrointestinal diseases and in obese and type 2 diabetic mice. Eradication of H.typhlonius in young conventional mice by antibiotics decreased the number of intestinal tumors. Additional presence of A.muciniphila prior to the antibiotic treatment reduced the tumor number even further. Colonization of pathogen-low FabplCre;Apc (15lox/+) mice with H.typhlonius or A.muciniphila increased the number of intestinal tumors, the thickness of the intestinal mucus layer and A.muciniphila colonization without H.typhlonius increased the density of mucin-producing goblet cells. However, dual colonization with H.typhlonius and A.muciniphila significantly reduced the number of intestinal tumors, the mucus layer thickness and goblet cell density to that of control mice. By global microbiota composition analysis, we found a positive association of A.muciniphila, and of H.typhlonius, and a negative association of unclassified Clostridiales with increased tumor burden. We conclude that A.muciniphila and H.typhlonius can modulate gut microbiota composition and intestinal tumor development in mice.
Collapse
Affiliation(s)
| | - Clara Belzer
- Laboratory of Microbiology, Wageningen University 6703 HB, Wageningen, The Netherlands
| | - Sacha A F T van Hijum
- Centre for Molecular and Biomolecular Informatics Bacterial Genomics, Radboud University Medical Centre 6525 GA, Nijmegen, The Netherlands, NIZO Food Research BV 6718 ZB, Ede, The Netherlands
| | | | | | | | - Ed J Kuijper
- Department of Medical Microbiology, Leiden University Medical Center 2300 RC, Leiden, The Netherlands and
| | | | - Willem M de Vos
- Laboratory of Microbiology, Wageningen University 6703 HB, Wageningen, The Netherlands, Department of Veterinary Biosciences, University of Helsinki 00014, Helsinki, Finland
| | | | | |
Collapse
|
10
|
Gheyas AA, Boschiero C, Eory L, Ralph H, Kuo R, Woolliams JA, Burt DW. Functional classification of 15 million SNPs detected from diverse chicken populations. DNA Res 2015; 22:205-17. [PMID: 25926514 PMCID: PMC4463845 DOI: 10.1093/dnares/dsv005] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 03/20/2015] [Indexed: 12/11/2022] Open
Abstract
Next-generation sequencing has prompted a surge of discovery of millions of genetic variants from vertebrate genomes. Besides applications in genetic association and linkage studies, a fraction of these variants will have functional consequences. This study describes detection and characterization of 15 million SNPs from chicken genome with the goal to predict variants with potential functional implications (pfVars) from both coding and non-coding regions. The study reports: 183K amino acid-altering SNPs of which 48% predicted as evolutionary intolerant, 13K splicing variants, 51K likely to alter RNA secondary structures, 500K within most conserved elements and 3K from non-coding RNAs. Regions of local fixation within commercial broiler and layer lines were investigated as potential selective sweeps using genome-wide SNP data. Relationships with phenotypes, if any, of the pfVars were explored by overlaying the sweep regions with known QTLs. Based on this, the candidate genes and/or causal mutations for a number of important traits are discussed. Although the fixed variants within sweep regions were enriched with non-coding SNPs, some non-synonymous-intolerant mutations reached fixation, suggesting their possible adaptive advantage. The results presented in this study are expected to have important implications for future genomic research to identify candidate causal mutations and in poultry breeding.
Collapse
Affiliation(s)
- Almas A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - Clarissa Boschiero
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - Lel Eory
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - Hannah Ralph
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - John A Woolliams
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| |
Collapse
|
11
|
Shapter FM, Cross M, Ablett G, Malory S, Chivers IH, King GJ, Henry RJ. High-throughput sequencing and mutagenesis to accelerate the domestication of Microlaena stipoides as a new food crop. PLoS One 2013; 8:e82641. [PMID: 24367532 PMCID: PMC3867367 DOI: 10.1371/journal.pone.0082641] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Accepted: 10/26/2013] [Indexed: 12/21/2022] Open
Abstract
Global food demand, climatic variability and reduced land availability are driving the need for domestication of new crop species. The accelerated domestication of a rice-like Australian dryland polyploid grass, Microlaena stipoides (Poaceae), was targeted using chemical mutagenesis in conjunction with high throughput sequencing of genes for key domestication traits. While M. stipoides has previously been identified as having potential as a new grain crop for human consumption, only a limited understanding of its genetic diversity and breeding system was available to aid the domestication process. Next generation sequencing of deeply-pooled target amplicons estimated allelic diversity of a selected base population at 14.3 SNP/Mb and identified novel, putatively mutation-induced polymorphisms at about 2.4 mutations/Mb. A 97% lethal dose (LD₉₇) of ethyl methanesulfonate treatment was applied without inducing sterility in this polyploid species. Forward and reverse genetic screens identified beneficial alleles for the domestication trait, seed-shattering. Unique phenotypes observed in the M2 population suggest the potential for rapid accumulation of beneficial traits without recourse to a traditional cross-breeding strategy. This approach may be applicable to other wild species, unlocking their potential as new food, fibre and fuel crops.
Collapse
Affiliation(s)
- Frances M. Shapter
- Southern Cross Plant Science, Southern Cross University, Lismore, New South Wales, Australia
- * E-mail:
| | - Michael Cross
- Southern Cross Plant Science, Southern Cross University, Lismore, New South Wales, Australia
| | - Gary Ablett
- Southern Cross Plant Science, Southern Cross University, Lismore, New South Wales, Australia
| | - Sylvia Malory
- Southern Cross Plant Science, Southern Cross University, Lismore, New South Wales, Australia
| | - Ian H. Chivers
- Southern Cross Plant Science, Southern Cross University, Lismore, New South Wales, Australia
- Native Seeds Pty Ltd, Sandringham, Victoria, Australia
| | - Graham J. King
- Southern Cross Plant Science, Southern Cross University, Lismore, New South Wales, Australia
| | - Robert J. Henry
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
12
|
Zhao Z, Wang W, Wei Z. An empirical Bayes testing procedure for detecting variants in analysis of next generation sequencing data. Ann Appl Stat 2013. [DOI: 10.1214/13-aoas660] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Rellstab C, Zoller S, Tedder A, Gugerli F, Fischer MC. Validation of SNP allele frequencies determined by pooled next-generation sequencing in natural populations of a non-model plant species. PLoS One 2013; 8:e80422. [PMID: 24244686 PMCID: PMC3820589 DOI: 10.1371/journal.pone.0080422] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 10/02/2013] [Indexed: 11/28/2022] Open
Abstract
Sequencing of pooled samples (Pool-Seq) using next-generation sequencing technologies has become increasingly popular, because it represents a rapid and cost-effective method to determine allele frequencies for single nucleotide polymorphisms (SNPs) in population pools. Validation of allele frequencies determined by Pool-Seq has been attempted using an individual genotyping approach, but these studies tend to use samples from existing model organism databases or DNA stores, and do not validate a realistic setup for sampling natural populations. Here we used pyrosequencing to validate allele frequencies determined by Pool-Seq in three natural populations of Arabidopsis halleri (Brassicaceae). The allele frequency estimates of the pooled population samples (consisting of 20 individual plant DNA samples) were determined after mapping Illumina reads to (i) the publicly available, high-quality reference genome of a closely related species (Arabidopsis thaliana) and (ii) our own de novo draft genome assembly of A. halleri. We then pyrosequenced nine selected SNPs using the same individuals from each population, resulting in a total of 540 samples. Our results show a highly significant and accurate relationship between pooled and individually determined allele frequencies, irrespective of the reference genome used. Allele frequencies differed on average by less than 4%. There was no tendency that either the Pool-Seq or the individual-based approach resulted in higher or lower estimates of allele frequencies. Moreover, the rather high coverage in the mapping to the two reference genomes, ranging from 55 to 284x, had no significant effect on the accuracy of the Pool-Seq. A resampling analysis showed that only very low coverage values (below 10-20x) would substantially reduce the precision of the method. We therefore conclude that a pooled re-sequencing approach is well suited for analyses of genetic variation in natural populations.
Collapse
Affiliation(s)
- Christian Rellstab
- Biodiversity and Conservation Biology, Swiss Federal Research Institute WSL, Birmensdorf, Switzerland
| | - Stefan Zoller
- Genetic Diversity Centre, ETH Zürich, Zürich, Switzerland
| | - Andrew Tedder
- Institute of Evolutionary Biology and Environmental Studies and Institute of Plant Biology, University of Zürich, Zürich, Switzerland
| | - Felix Gugerli
- Biodiversity and Conservation Biology, Swiss Federal Research Institute WSL, Birmensdorf, Switzerland
| | | |
Collapse
|
14
|
Paris M, Marcombe S, Coissac E, Corbel V, David JP, Després L. Investigating the genetics of Bti resistance using mRNA tag sequencing: application on laboratory strains and natural populations of the dengue vector Aedes aegypti. Evol Appl 2013; 6:1012-27. [PMID: 24187584 PMCID: PMC3804235 DOI: 10.1111/eva.12082] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2012] [Accepted: 04/25/2013] [Indexed: 11/29/2022] Open
Abstract
Mosquito control is often the main method used to reduce mosquito-transmitted diseases. In order to investigate the genetic basis of resistance to the bio-insecticide Bacillus thuringiensis subsp. israelensis (Bti), we used information on polymorphism obtained from cDNA tag sequences from pooled larvae of laboratory Bti-resistant and susceptible Aedes aegypti mosquito strains to identify and analyse 1520 single nucleotide polymorphisms (SNPs). Of the 372 SNPs tested, 99.2% were validated using DNA Illumina GoldenGate® array, with a strong correlation between the allelic frequencies inferred from the pooled and individual data (r = 0.85). A total of 11 genomic regions and five candidate genes were detected using a genome scan approach. One of these candidate genes showed significant departures from neutrality in the resistant strain at sequence level. Six natural populations from Martinique Island were sequenced for the 372 tested SNPs with a high transferability (87%), and association mapping analyses detected 14 loci associated with Bti resistance, including one located in a putative receptor for Cry11 toxins. Three of these loci were also significantly differentiated between the laboratory strains, suggesting that most of the genes associated with resistance might differ between the two environments. It also suggests that common selected regions might harbour key genes for Bti resistance.
Collapse
Affiliation(s)
- Margot Paris
- Laboratoire d'Ecologie Alpine (LECA), UMR 5553 CNRS-Université de Grenoble Grenoble, France ; Plant Ecological Genetics, Institute of Integrative Biology, ETH Zurich, Switzerland
| | | | | | | | | | | |
Collapse
|
15
|
Next generation analysis of breast cancer genomes for precision medicine. Cancer Lett 2013; 339:1-7. [PMID: 23879964 DOI: 10.1016/j.canlet.2013.07.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Revised: 07/10/2013] [Accepted: 07/14/2013] [Indexed: 12/15/2022]
Abstract
For many years breast cancer classification has been based on histology and immune-histochemistry. New techniques, more strictly related to cancer biology, partially succeeded in fractionating patients, correlated to survival and better predicted the patient response to therapy. Nowadays, great expectations arise from massive parallel or high throughput next generation sequencing. Cancer genomics has already revolutionized our knowledge of breast cancer molecular pathology, paving the way to the development of new and more effective clinical protocols. This review is focused on the most recent advances in the field of cancer genomics and epigenomics, including DNA alterations and driver gene mutations, gene fusions, DNA methylation and miRNA expression.
Collapse
|
16
|
Tan MK, Koval J, Ghalayini A. Novel genetic variants of GA-insensitive Rht-1 genes in hexaploid wheat and their potential agronomic value. PLoS One 2013; 8:e69690. [PMID: 23894524 PMCID: PMC3716649 DOI: 10.1371/journal.pone.0069690] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Accepted: 06/13/2013] [Indexed: 01/01/2023] Open
Abstract
This study has found numerous novel genetic variants of GA-insensitive dwarfing genes with potential agricultural value for crop improvement. The cultivar, Spica is a tall genotype and possesses the wild-type genes of Rht-A1a, Rht-B1a and Rht-D1a. The cultivar Quarrion possesses a null mutant in the DELLA motif in each of the 3 genomes. This is a first report of a null mutant of Rht-A1. In addition, novel null mutants which differ from reported null alleles of Rht-B1b, Rht-B1e and Rht-D1b have been found in Quarrion, Carnamah and Whistler. The accession, Aus1408 has an allele of Rht-B1 with a mutation in the conserved ‘TVHYNP’ N-terminal signal binding domain with possible implications on its sensitivity to GA. Mutations in the conserved C-terminal GRAS domain of Rht-A1 alleles with possible effects on expression have been found in WW1842, Quarrion and Drysdale. Genetic variants with putative spliceosomal introns in the GRAS domain have been found in all accessions except Spica. Genome-specific cis-sequences about 124 bp upstream of the start codon of the Rht-1 gene have been identified for each of the three genomes.
Collapse
Affiliation(s)
- Mui-Keng Tan
- Elizabeth Macarthur Agricultural Institute, New South Wales (NSW) Department of Primary Industries, Menangle, New South Wales, Australia.
| | | | | |
Collapse
|
17
|
Zavodna M, Grueber CE, Gemmell NJ. Parallel tagged next-generation sequencing on pooled samples - a new approach for population genetics in ecology and conservation. PLoS One 2013; 8:e61471. [PMID: 23637841 PMCID: PMC3630221 DOI: 10.1371/journal.pone.0061471] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Accepted: 03/08/2013] [Indexed: 12/02/2022] Open
Abstract
Next-generation sequencing (NGS) on pooled samples has already been broadly applied in human medical diagnostics and plant and animal breeding. However, thus far it has been only sparingly employed in ecology and conservation, where it may serve as a useful diagnostic tool for rapid assessment of species genetic diversity and structure at the population level. Here we undertake a comprehensive evaluation of the accuracy, practicality and limitations of parallel tagged amplicon NGS on pooled population samples for estimating species population diversity and structure. We obtained 16S and Cyt b data from 20 populations of Leiopelma hochstetteri, a frog species of conservation concern in New Zealand, using two approaches - parallel tagged NGS on pooled population samples and individual Sanger sequenced samples. Data from each approach were then used to estimate two standard population genetic parameters, nucleotide diversity (π) and population differentiation (FST), that enable population genetic inference in a species conservation context. We found a positive correlation between our two approaches for population genetic estimates, showing that the pooled population NGS approach is a reliable, rapid and appropriate method for population genetic inference in an ecological and conservation context. Our experimental design also allowed us to identify both the strengths and weaknesses of the pooled population NGS approach and outline some guidelines and suggestions that might be considered when planning future projects.
Collapse
Affiliation(s)
- Monika Zavodna
- Centre for Reproduction and Genomics, Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Catherine E. Grueber
- Centre for Reproduction and Genomics, Department of Anatomy, University of Otago, Dunedin, New Zealand
- Department of Zoology, University of Otago, Dunedin, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin, New Zealand
| | - Neil J. Gemmell
- Centre for Reproduction and Genomics, Department of Anatomy, University of Otago, Dunedin, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin, New Zealand
| |
Collapse
|
18
|
Wallner B, Vogl C, Shukla P, Burgstaller JP, Druml T, Brem G. Identification of genetic variation on the horse y chromosome and the tracing of male founder lineages in modern breeds. PLoS One 2013; 8:e60015. [PMID: 23573227 PMCID: PMC3616054 DOI: 10.1371/journal.pone.0060015] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Accepted: 02/20/2013] [Indexed: 11/19/2022] Open
Abstract
The paternally inherited Y chromosome displays the population genetic history of males. While modern domestic horses (Equus caballus) exhibit abundant diversity within maternally inherited mitochondrial DNA, no significant Y-chromosomal sequence diversity has been detected. We used high throughput sequencing technology to identify the first polymorphic Y-chromosomal markers useful for tracing paternal lines. The nucleotide variability of the modern horse Y chromosome is extremely low, resulting in six haplotypes (HT), all clearly distinct from the Przewalski horse (E. przewalskii). The most widespread HT1 is ancestral and the other five haplotypes apparently arose on the background of HT1 by mutation or gene conversion after domestication. Two haplotypes (HT2 and HT3) are widely distributed at high frequencies among modern European horse breeds. Using pedigree information, we trace the distribution of Y-haplotype diversity to particular founders. The mutation leading to HT3 occurred in the germline of the famous English Thoroughbred stallion “Eclipse” or his son or grandson and its prevalence demonstrates the influence of this popular paternal line on modern sport horse breeds. The pervasive introgression of Thoroughbred stallions during the last 200 years to refine autochthonous breeds has strongly affected the distribution of Y-chromosomal variation in modern horse breeds and has led to the replacement of autochthonous Y chromosomes. Only a few northern European breeds bear unique variants at high frequencies or fixed within but not shared among breeds. Our Y-chromosomal data complement the well established mtDNA lineages and document the male side of the genetic history of modern horse breeds and breeding practices.
Collapse
Affiliation(s)
- Barbara Wallner
- Department of Biomedical Sciences, Institute of Animal Breeding and Genetics, University of Veterinary Medicine Vienna, Vienna, Austria.
| | | | | | | | | | | |
Collapse
|
19
|
Abstract
Advances in DNA sequencing provide tools for efficient large-scale discovery of markers for use in plants. Discovery options include large-scale amplicon sequencing, transcriptome sequencing, gene-enriched genome sequencing and whole genome sequencing. Examples of each of these approaches and their potential to generate molecular markers for specific applications have been described. Sequencing the whole genome of parents identifies all the polymorphisms available for analysis in their progeny. Sequencing PCR amplicons of sets of candidate genes from DNA bulks can be used to define the available variation in these genes that might be exploited in a population or germplasm collection. Sequencing of the transcriptomes of genotypes varying for the trait of interest may identify genes with patterns of expression that could explain the phenotypic variation. Sequencing genomic DNA enriched for genes by hybridization with probes for all or some of the known genes simplifies sequencing and analysis of differences in gene sequences between large numbers of genotypes and genes especially when working with complex genomes. Examples of application of the above-mentioned techniques have been described.
Collapse
|
20
|
Cortes A, Field J, Glazov EA, Hadler J, Stankovich J, Brown MA. Resequencing and fine-mapping of the chromosome 12q13-14 locus associated with multiple sclerosis refines the number of implicated genes. Hum Mol Genet 2013; 22:2283-92. [PMID: 23406874 DOI: 10.1093/hmg/ddt062] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Multiple sclerosis (MS) is a common chronic inflammatory disease of the central nervous system. Susceptibility to the disease is affected by both environmental and genetic factors. Genetic factors include haplotypes in the histocompatibility complex (MHC) and over 50 non-MHC loci reported by genome-wide association studies. Amongst these, we previously reported polymorphisms in chromosome 12q13-14 with a protective effect in individuals of European descent. This locus spans 288 kb and contains 17 genes, including several candidate genes which have potentially significant pathogenic and therapeutic implications. In this study, we aimed to fine-map this locus. We have implemented a two-phase study: a variant discovery phase where we have used next-generation sequencing and two target-enrichment strategies [long-range polymerase chain reaction (PCR) and Nimblegen's solution phase hybridization capture] in pools of 25 samples; and a genotyping phase where we genotyped 712 variants in 3577 healthy controls and 3269 MS patients. This study confirmed the association (rs2069502, P = 9.9 × 10(-11), OR = 0.787) and narrowed down the locus of association to an 86.5 kb region. Although the study was unable to pinpoint the key-associated variant, we have identified a 42 (genotyped and imputed) single-nucleotide polymorphism haplotype block likely to harbour the causal variant. No evidence of association at previously reported low-frequency variants in CYP27B1 was observed. As part of the study we compared variant discovery performance using two target-enrichment strategies. We concluded that our pools enriched with Nimblegen's solution phase hybridization capture had better sensitivity to detect true variants than the pools enriched with long-range PCR, whilst specificity was better in the long-range PCR-enriched pools compared with solution phase hybridization capture enriched pools; this result has important implications for the design of future fine-mapping studies.
Collapse
Affiliation(s)
- Adrian Cortes
- University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Qld. 4102, Australia
| | | | | | | | | | | | | |
Collapse
|
21
|
Evaluation of allele frequency estimation using pooled sequencing data simulation. ScientificWorldJournal 2013; 2013:895496. [PMID: 23476151 PMCID: PMC3582166 DOI: 10.1155/2013/895496] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 12/30/2012] [Indexed: 11/17/2022] Open
Abstract
Next-generation sequencing (NGS) technology has provided researchers with opportunities to study the genome in unprecedented detail. In particular, NGS is applied to disease association studies. Unlike genotyping chips, NGS is not limited to a fixed set of SNPs. Prices for NGS are now comparable to the SNP chip, although for large studies the cost can be substantial. Pooling techniques are often used to reduce the overall cost of large-scale studies. In this study, we designed a rigorous simulation model to test the practicability of estimating allele frequency from pooled sequencing data. We took crucial factors into consideration, including pool size, overall depth, average depth per sample, pooling variation, and sampling variation. We used real data to demonstrate and measure reference allele preference in DNAseq data and implemented this bias in our simulation model. We found that pooled sequencing data can introduce high levels of relative error rate (defined as error rate divided by targeted allele frequency) and that the error rate is more severe for low minor allele frequency SNPs than for high minor allele frequency SNPs. In order to overcome the error introduced by pooling, we recommend a large pool size and high average depth per sample.
Collapse
|
22
|
Chen Q, Sun F. A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics 2013; 14 Suppl 1:S1. [PMID: 23369070 PMCID: PMC3549804 DOI: 10.1186/1471-2164-14-s1-s1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have identified many common polymorphisms associated with complex traits. However, these associated common variants explain only a small fraction of the phenotypic variances, leaving a substantial portion of genetic heritability unexplained. As a result, searches for "missing" heritability are drawing increasing attention, particularly for rare variant studies that often require a large sample size and, thus, extensive sequencing effort. Although the development of next generation sequencing (NGS) technologies has made it possible to sequence a large number of reads economically and efficiently, it is still often cost prohibitive to sequence thousands of individuals that are generally required for association studies. A more efficient and cost-effective design would involve pooling the genetic materials of multiple individuals together and then sequencing the pools, instead of the individuals. This pooled sequencing approach has improved the plausibility of association studies for rare variants, while, at the same time, posed a great challenge to the pooled sequencing data analysis, essentially because individual sample identity is lost, and NGS sequencing errors could be hard to distinguish from low frequency alleles. RESULTS A unified approach for estimating minor allele frequency, SNP calling and association studies based on pooled sequencing data using an expectation maximization (EM) algorithm is developed in this paper. This approach makes it possible to study the effects of minor allele frequency, sequencing error rate, number of pools, number of individuals in each pool, and the sequencing depth on the estimation accuracy of minor allele frequencies. We show that the naive method of estimating minor allele frequencies by taking the fraction of observed minor alleles can be significantly biased, especially for rare variants. In contrast, our EM approach can give an unbiased estimate of the minor allele frequency under all scenarios studied in this paper. A SNP calling approach, EM-SNP, for pooled sequencing data based on the EM algorithm is then developed and compared with another recent SNP calling method, SNVer. We show that EM-SNP outperforms SNVer in terms of the fraction of db-SNPs among the called SNPs, as well as transition/transversion (Ti/Tv) ratio. Finally, the EM approach is used to study the association between variants and type I diabetes. CONCLUSIONS The EM-based approach for the analysis of pooled sequencing data can accurately estimate minor allele frequencies, call SNPs, and find associations between variants and complex traits. This approach is especially useful for studies involving rare variants.
Collapse
Affiliation(s)
- Quan Chen
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA 90089-2910, USA
| | | |
Collapse
|
23
|
Quast C, Altmann A, Weber P, Arloth J, Bader D, Heck A, Pfister H, Müller-Myhsok B, Erhardt A, Binder EB. Rare variants in TMEM132D in a case-control sample for panic disorder. Am J Med Genet B Neuropsychiatr Genet 2012; 159B:896-907. [PMID: 22911938 DOI: 10.1002/ajmg.b.32096] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Accepted: 08/03/2012] [Indexed: 11/06/2022]
Abstract
Genome-wide association studies have identified common variants associated with common diseases. Most variants, however, explain only a small proportion of the estimated heritability, suggesting that rare variants might contribute to a larger extent to common diseases than assumed to date. Here, we use next-generation sequencing to test whether such variants contribute to the risk for anxiety disorders by re-sequencing 40 kb including all exons of the TMEM132D locus which we have previously shown to be associated with panic disorder and anxiety severity measures. DNA from 300 patients suffering from anxiety disorders, mostly panic disorder (84.7%), and 300 healthy controls was screened for the presence of genetic variants using next-generation re-sequencing in a pooled approach. Results were verified by individual re-genotyping. We identified 371 variants of which 247 had not been reported before, including 15 novel non-synonymous variants. The majority, 76% of these variants had a minor allele frequency less than 5%. While we did not identify additional common variants in TMEM132D associated with panic disorders, we observed an overrepresentation of presumably functional coding variants in healthy controls as compared to cases as well as a higher rate of private coding variants in cases, with one non-synonymous coding variant present in four patients but not in any of the matched controls nor in over 5,500 individuals of different ethnic origins from publicly available re-sequencing datasets. Our data suggest that not only common but also putatively functional and/or rare variants within TMEM132D might contribute to the risk to develop anxiety disorders.
Collapse
Affiliation(s)
- Carina Quast
- Max Planck Institute of Psychiatry, Munich, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Out AA, Wasielewski M, Huijts PEA, van Minderhout IJHM, Houwing-Duistermaat JJ, Tops CMJ, Nielsen M, Seynaeve C, Wijnen JT, Breuning MH, van Asperen CJ, Schutte M, Hes FJ, Devilee P. MUTYH gene variants and breast cancer in a Dutch case–control study. Breast Cancer Res Treat 2012; 134:219-27. [PMID: 22297469 PMCID: PMC3397219 DOI: 10.1007/s10549-012-1965-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 01/16/2012] [Indexed: 12/13/2022]
Abstract
The MUTYH gene is involved in base excision repair. MUTYH mutations predispose to recessively inherited colorectal polyposis and cancer. Here, we evaluate an association with breast cancer (BC), following up our previous finding of an elevated BC frequency among Dutch bi-allelic MUTYH mutation carriers. A case–control study was performed comparing 1,469 incident BC patients (ORIGO cohort), 471 individuals displaying features suggesting a genetic predisposition for BC, but without a detectable BRCA1 or BRCA2 mutation (BRCAx cohort), and 1,666 controls. First, for 303 consecutive patients diagnosed before age 55 years and/or with multiple primary breast tumors, the MUTYH coding region and flanking introns were sequenced. The remaining subjects were genotyped for five coding variants, p.Tyr179Cys, p.Arg309Cys, p.Gly396Asp, p.Pro405Leu, and p.Ser515Phe, and four tagging SNPs, c.37-2487G>T, p.Val22Met, c.504+35G>A, and p.Gln338His. No bi-allelic pathogenic MUTYH mutations were identified. The pathogenic variant p.Gly396Asp and the variant of uncertain significance p.Arg309Cys occurred twice as frequently in BRCAx subjects as compared to incident BC patients and controls (p = 0.13 and p = 0.15, respectively). The likely benign variant p.Val22Met occurred less frequently in patients from the incident BC (p = 0.03) and BRCAx groups (p = 0.11), respectively, as compared to the controls. Minor allele genotypes of several MUTYH variants showed trends towards association with lobular BC histology. This extensive case–control study could not confirm previously reported associations of MUTYH variants with BC, although it was too small to exclude subtle effects on BC susceptibility.
Collapse
Affiliation(s)
- Astrid A. Out
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
- Present Address: Department of Pathology, VU University Medical Center, Amsterdam, The Netherlands
| | - Marijke Wasielewski
- Department of Medical Oncology, Josephine Nefkens Institute, Erasmus University Medical Centre, Rotterdam, The Netherlands
- Present Address: Department of Genetics, University Medical Center Groningen, Groningen, The Netherlands
| | - Petra E. A. Huijts
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Ivonne J. H. M. van Minderhout
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | | | - Carli M. J. Tops
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Maartje Nielsen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Caroline Seynaeve
- Department of Medical Oncology, Erasmus MC–Daniel den Hoed Cancer Center, Rotterdam, The Netherlands
| | - Juul T. Wijnen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Martijn H. Breuning
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Christi J. van Asperen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Mieke Schutte
- Department of Medical Oncology, Josephine Nefkens Institute, Erasmus University Medical Centre, Rotterdam, The Netherlands
- Present Address: Lorentz Center, Leiden, The Netherlands
| | - Frederik J. Hes
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter Devilee
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
- Department of Pathology, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
25
|
Feder AF, Petrov DA, Bergland AO. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One 2012; 7:e48588. [PMID: 23152785 PMCID: PMC3494690 DOI: 10.1371/journal.pone.0048588] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/03/2012] [Indexed: 12/14/2022] Open
Abstract
High-throughput pooled resequencing offers significant potential for whole genome population sequencing. However, its main drawback is the loss of haplotype information. In order to regain some of this information, we present LDx, a computational tool for estimating linkage disequilibrium (LD) from pooled resequencing data. LDx uses an approximate maximum likelihood approach to estimate LD (r(2)) between pairs of SNPs that can be observed within and among single reads. LDx also reports r(2) estimates derived solely from observed genotype counts. We demonstrate that the LDx estimates are highly correlated with r(2) estimated from individually resequenced strains. We discuss the performance of LDx using more stringent quality conditions and infer via simulation the degree to which performance can improve based on read depth. Finally we demonstrate two possible uses of LDx with real and simulated pooled resequencing data. First, we use LDx to infer genomewide patterns of decay of LD with physical distance in D. melanogaster population resequencing data. Second, we demonstrate that r(2) estimates from LDx are capable of distinguishing alternative demographic models representing plausible demographic histories of D. melanogaster.
Collapse
Affiliation(s)
- Alison F Feder
- Department of Biology, Stanford University, Stanford, California, United States of America.
| | | | | |
Collapse
|
26
|
Wang W, Yin X, Soo Pyon Y, Hayes M, Li J. Rare variant discovery and calling by sequencing pooled samples with overlaps. ACTA ACUST UNITED AC 2012; 29:29-38. [PMID: 23104896 DOI: 10.1093/bioinformatics/bts645] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION For many complex traits/diseases, it is believed that rare variants account for some of the missing heritability that cannot be explained by common variants. Sequencing a large number of samples through DNA pooling is a cost-effective strategy to discover rare variants and to investigate their associations with phenotypes. Overlapping pool designs provide further benefit because such approaches can potentially identify variant carriers, which is important for downstream applications of association analysis of rare variants. However, existing algorithms for analysing sequence data from overlapping pools are limited. RESULTS We propose a complete data analysis framework for overlapping pool designs, with novelties in all three major steps: variant pool and variant locus identification, variant allele frequency estimation and variant sample decoding. The framework can be used in combination with any design matrix. We have investigated its performance based on two different overlapping designs and have compared it with three state-of-the-art methods, by simulating targeted sequencing and by pooling real sequence data. Results on both datasets show that our algorithm has made significant improvements over existing ones. In conclusion, successful discovery of rare variants and identification of variant carriers using overlapping pool strategies critically depend on many steps, from generation of design matrixes to decoding algorithms. The proposed framework in combination with the design matrixes generated based on the Chinese remainder theorem achieves best overall results. AVAILABILITY Source code of the program, termed VIP for Variant Identification by Pooling, is available at http://cbc.case.edu/VIP.
Collapse
Affiliation(s)
- Wenhui Wang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA
| | | | | | | | | |
Collapse
|
27
|
Liang WE, Thomas DC, Conti DV. Analysis and optimal design for association studies using next-generation sequencing with case-control pools. Genet Epidemiol 2012; 36:870-81. [PMID: 22972696 DOI: 10.1002/gepi.21681] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2012] [Revised: 08/03/2012] [Accepted: 08/07/2012] [Indexed: 11/09/2022]
Abstract
With its potential to discover a much greater amount of genetic variation, next-generation sequencing is fast becoming an emergent tool for genetic association studies. However, the cost of sequencing all individuals in a large-scale population study is still high in comparison to most alternative genotyping options. While the ability to identify individual-level data is lost (without bar-coding), sequencing pooled samples can substantially lower costs without compromising the power to detect significant associations. We propose a hierarchical Bayesian model that estimates the association of each variant using pools of cases and controls, accounting for the variation in read depth across pools and sequencing error. To investigate the performance of our method across a range of number of pools, number of individuals within each pool, and average coverage, we undertook extensive simulations varying effect sizes, minor allele frequencies, and sequencing error rates. In general, the number of pools and pool size have dramatic effects on power while the total depth of coverage per pool has only a moderate impact. This information can guide the selection of a study design that maximizes power subject to cost, sample size, or other laboratory constraints. We provide an R package (hiPOD: hierarchical Pooled Optimal Design) to find the optimal design, allowing the user to specify a cost function, cost, and sample size limitations, and distributions of effect size, minor allele frequency, and sequencing error rate.
Collapse
Affiliation(s)
- Wei E Liang
- Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | | | | |
Collapse
|
28
|
Zhu Y, Bergland AO, González J, Petrov DA. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS One 2012; 7:e41901. [PMID: 22848651 PMCID: PMC3406057 DOI: 10.1371/journal.pone.0041901] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 06/28/2012] [Indexed: 11/26/2022] Open
Abstract
The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.
Collapse
Affiliation(s)
- Yuan Zhu
- Department of Genetics, Stanford University, Stanford, California, United States of America.
| | | | | | | |
Collapse
|
29
|
Chen X, Listman JB, Slack FJ, Gelernter J, Zhao H. Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples. Genet Epidemiol 2012; 36:549-60. [PMID: 22674656 DOI: 10.1002/gepi.21648] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2011] [Revised: 04/15/2012] [Accepted: 04/25/2012] [Indexed: 01/01/2023]
Abstract
Next-generation sequencing is widely used to study complex diseases because of its ability to identify both common and rare variants without prior single nucleotide polymorphism (SNP) information. Pooled sequencing of implicated target regions can lower costs and allow more samples to be analyzed, thus improving statistical power for disease-associated variant detection. Several methods for disease association tests of pooled data and for optimal pooling designs have been developed under certain assumptions of the pooling process, for example, equal/unequal contributions to the pool, sequencing depth variation, and error rate. However, these simplified assumptions may not portray the many factors affecting pooled sequencing data quality, such as PCR amplification during target capture and sequencing, reference allele preferential bias, and others. As a result, the properties of the observed data may differ substantially from those expected under the simplified assumptions. Here, we use real datasets from targeted sequencing of pooled samples, together with microarray SNP genotypes of the same subjects, to identify and quantify factors (biases and errors) affecting the observed sequencing data. Through simulations, we find that these factors have a significant impact on the accuracy of allele frequency estimation and the power of association tests. Furthermore, we develop a workflow protocol to incorporate these factors in data analysis to reduce the potential biases and errors in pooled sequencing data and to gain better estimation of allele frequencies. The workflow, Psafe, is available at http://bioinformatics.med.yale.edu/group/.
Collapse
Affiliation(s)
- Xiaowei Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | | | | | | | | |
Collapse
|
30
|
Determination of RET Sequence Variation in an MEN2 Unaffected Cohort Using Multiple-Sample Pooling and Next-Generation Sequencing. J Thyroid Res 2012; 2012:318232. [PMID: 22545224 PMCID: PMC3321559 DOI: 10.1155/2012/318232] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Accepted: 01/23/2012] [Indexed: 11/30/2022] Open
Abstract
Multisample, nonindexed pooling combined with next-generation sequencing (NGS) was used to discover RET proto-oncogene sequence variation within a cohort known to be unaffected by multiple endocrine neoplasia type 2 (MEN2). DNA samples (113 Caucasians, 23 persons of other ethnicities) were amplified for RET intron 9 to intron 16 and then divided into 5 pools of <30 samples each before library prep and NGS. Two controls were included in this study, a single sample and a pool of 50 samples that had been previously sequenced by the same NGS methods. All 59 variants previously detected in the 50-pool control were present. Of the 61 variants detected in the unaffected cohort, 20 variants were novel changes. Several variants were validated by high-resolution melting analysis and Sanger sequencing, and their allelic frequencies correlated well with those determined by NGS. The results from this unaffected cohort will be added to the RET MEN2 database.
Collapse
|
31
|
Rossetti S, Hopp K, Sikkink RA, Sundsbak JL, Lee YK, Kubly V, Eckloff BW, Ward CJ, Winearls CG, Torres VE, Harris PC. Identification of gene mutations in autosomal dominant polycystic kidney disease through targeted resequencing. J Am Soc Nephrol 2012; 23:915-33. [PMID: 22383692 DOI: 10.1681/asn.2011101032] [Citation(s) in RCA: 127] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Mutations in two large multi-exon genes, PKD1 and PKD2, cause autosomal dominant polycystic kidney disease (ADPKD). The duplication of PKD1 exons 1-32 as six pseudogenes on chromosome 16, the high level of allelic heterogeneity, and the cost of Sanger sequencing complicate mutation analysis, which can aid diagnostics of ADPKD. We developed and validated a strategy to analyze both the PKD1 and PKD2 genes using next-generation sequencing by pooling long-range PCR amplicons and multiplexing bar-coded libraries. We used this approach to characterize a cohort of 230 patients with ADPKD. This process detected definitely and likely pathogenic variants in 115 (63%) of 183 patients with typical ADPKD. In addition, we identified atypical mutations, a gene conversion, and one missed mutation resulting from allele dropout, and we characterized the pattern of deep intronic variation for both genes. In summary, this strategy involving next-generation sequencing is a model for future genetic characterization of large ADPKD populations.
Collapse
Affiliation(s)
- Sandro Rossetti
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, MN 55905, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Casals F, Idaghdour Y, Hussin J, Awadalla P. Next-generation sequencing approaches for genetic mapping of complex diseases. J Neuroimmunol 2012; 248:10-22. [PMID: 22285396 DOI: 10.1016/j.jneuroim.2011.12.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Revised: 11/30/2011] [Accepted: 12/15/2011] [Indexed: 01/12/2023]
Abstract
The advent of next generation sequencing technologies has opened new possibilities in the analysis of human disease. In this review we present the main next-generation sequencing technologies, with their major contributions and possible applications to the study of the genetic etiology of complex diseases.
Collapse
Affiliation(s)
- Ferran Casals
- Centre de Recherche du Centre Hospitalier Universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada.
| | | | | | | |
Collapse
|
33
|
Harismendy O, Schwab RB, Bao L, Olson J, Rozenzhak S, Kotsopoulos SK, Pond S, Crain B, Chee MS, Messer K, Link DR, Frazer KA. Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing. Genome Biol 2011; 12:R124. [PMID: 22185227 PMCID: PMC3334619 DOI: 10.1186/gb-2011-12-12-r124] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Revised: 10/18/2011] [Accepted: 12/20/2011] [Indexed: 12/18/2022] Open
Abstract
Ultra-deep targeted sequencing (UDT-Seq) can identify subclonal somatic mutations in tumor samples. Early assays' limited breadth and depth restrict their clinical utility. Here, we target 71 kb of mutational hotspots in 42 cancer genes. We present novel methods enhancing both laboratory workflow and mutation detection. We evaluate UDT-Seq true sensitivity and specificity (> 94% and > 99%, respectively) for low prevalence mutations in a mixing experiment and demonstrate its utility using six tumor samples. With an improved performance when run on the Illumina Miseq, the UDT-Seq assay is well suited for clinical applications to guide therapy and study clonal selection in heterogeneous samples.
Collapse
Affiliation(s)
- Olivier Harismendy
- Moores UCSD Cancer Center, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Kharabian-Masouleh A, Waters DLE, Reinke RF, Henry RJ. Discovery of polymorphisms in starch-related genes in rice germplasm by amplification of pooled DNA and deeply parallel sequencing. PLANT BIOTECHNOLOGY JOURNAL 2011; 9:1074-85. [PMID: 21645201 DOI: 10.1111/j.1467-7652.2011.00629.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
High-throughput sequencing of pooled DNA was applied to polymorphism discovery in candidate genes involved in starch synthesis. This approach employed semi- to long-range PCR (LR-PCR) followed by next-generation sequencing technology. A total of 17 rice starch synthesis genes encoding seven classes of enzymes, including ADP-glucose pyrophosphorylase (AGPase), granule starch synthase (GBSS), soluble starch synthase (SS), starch branching enzyme (BE), starch debranching enzyme (DBE) and starch phosphorylase (SPHOL) and phosphate translocator (GPT1) from 233 genotypes were PCR amplified using semi- to long-range PCR. The amplification products were equimolarly pooled and sequenced using massively parallel sequencing technology (MPS). By detecting single nucleotide polymorphism (SNP)/Indels in both coding and noncoding areas of the genes, we identified genetic differences and characterized the SNP/Indel variation and distribution patterns among individual starch candidate genes. Approximately, 60.9 million reads were generated, of which 54.8 million (90%) mapped to the reference sequences. The average coverage rate ranged from 12,708 to 38,300 times for SSIIa and SSIIIb, respectively. SNPs and single/multiple-base Indels were analysed in a total assembled length of 116,403 bp. In total, 501 SNPs and 113 Indels were detected across the 17 starch-related loci. The ratio of synonymous to nonsynonymous SNPs (Ka/Ks) test indicated GBSSI and isoamylase 1 (ISA1) as the least diversified (most purified) and conservative genes as the studied populations have been through cycles of selection. This report demonstrates a useful strategy for screening germplasm by MPS to discover variants in a specific target group of genes.
Collapse
Affiliation(s)
- Ardashir Kharabian-Masouleh
- Southern Cross Plant Science, Centre for Plant Conservation Genetics, Southern Cross University, Lismore, NSW 2480, Australia.
| | | | | | | |
Collapse
|
35
|
Day-Williams AG, McLay K, Drury E, Edkins S, Coffey AJ, Palotie A, Zeggini E. An evaluation of different target enrichment methods in pooled sequencing designs for complex disease association studies. PLoS One 2011; 6:e26279. [PMID: 22069447 PMCID: PMC3206031 DOI: 10.1371/journal.pone.0026279] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Accepted: 09/23/2011] [Indexed: 01/27/2023] Open
Abstract
Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size.
Collapse
Affiliation(s)
- Aaron G. Day-Williams
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Kirsten McLay
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- The Genome Analysis Centre, Norwich, United Kingdom
| | - Eleanor Drury
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Sarah Edkins
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Alison J. Coffey
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Aarno Palotie
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Program in Medical and Population Genetics and Genetic Analysis Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Medical Genetics, University of Helsinki and University Central Hospital, Helsinki, Finland
| | - Eleftheria Zeggini
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| |
Collapse
|
36
|
Altmann A, Weber P, Quast C, Rex-Haffner M, Binder EB, Müller-Myhsok B. vipR: variant identification in pooled DNA using R. Bioinformatics 2011; 27:i77-84. [PMID: 21685105 PMCID: PMC3117388 DOI: 10.1093/bioinformatics/btr205] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Motivation: High-throughput-sequencing (HTS) technologies are the method of choice for screening the human genome for rare sequence variants causing susceptibility to complex diseases. Unfortunately, preparation of samples for a large number of individuals is still very cost- and labor intensive. Thus, recently, screens for rare sequence variants were carried out in samples of pooled DNA, in which equimolar amounts of DNA from multiple individuals are mixed prior to sequencing with HTS. The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool. Results: Our method vipR uses data from multiple DNA pools in order to compensate for differences in sequencing error rates along the sequenced region. More precisely, instead of aiming at discriminating sequence variants from sequencing errors, vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution. The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools. Performance of the methods was computed on SNPs that were also genotyped individually using a MALDI-TOF technique. On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity. Availability: The code of vipR is freely available via: http://sourceforge.net/projects/htsvipr/ Contact:altmann@mpipsykl.mpg.de
Collapse
Affiliation(s)
- Andre Altmann
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany.
| | | | | | | | | | | |
Collapse
|
37
|
Niranjan TS, Adamczyk A, Bravo HC, Taub MA, Wheelan SJ, Irizarry R, Wang T. Effective detection of rare variants in pooled DNA samples using Cross-pool tailcurve analysis. Genome Biol 2011; 12:R93. [PMID: 21955804 PMCID: PMC3308056 DOI: 10.1186/gb-2011-12-9-r93] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2011] [Revised: 08/08/2011] [Accepted: 09/28/2011] [Indexed: 01/16/2023] Open
Abstract
Sequencing targeted DNA regions in large samples is necessary to discover the full spectrum of rare variants. We report an effective Illumina sequencing strategy utilizing pooled samples with novel quality (Srfim) and filtering (SERVIC4E) algorithms. We sequenced 24 exons in two cohorts of 480 samples each, identifying 47 coding variants, including 30 present once per cohort. Validation by Sanger sequencing revealed an excellent combination of sensitivity and specificity for variant detection in pooled samples of both cohorts as compared to publicly available algorithms.
Collapse
Affiliation(s)
- Tejasvi S Niranjan
- McKusick-Nathans Institute of Genetic Medicine and Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | | | | | | | | | | |
Collapse
|
38
|
Zaboli G, Ameur A, Igl W, Johansson Å, Hayward C, Vitart V, Campbell S, Zgaga L, Polasek O, Schmitz G, van Duijn C, Oostra B, Pramstaller P, Hicks A, Meitinger T, Rudan I, Wright A, Wilson JF, Campbell H, Gyllensten U. Sequencing of high-complexity DNA pools for identification of nucleotide and structural variants in regions associated with complex traits. Eur J Hum Genet 2011; 20:77-83. [PMID: 21811304 DOI: 10.1038/ejhg.2011.138] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
We have used targeted genomic sequencing of high-complexity DNA pools based on long-range PCR and deep DNA sequencing by the SOLiD technology. The method was used for sequencing of 286 kb from four chromosomal regions with quantitative trait loci (QTL) influencing blood plasma lipid and uric acid levels in DNA pools of 500 individuals from each of five European populations. The method shows very good precision in estimating allele frequencies as compared with individual genotyping of SNPs (r(2) = 0.95, P < 10(-16)). Validation shows that the method is able to identify novel SNPs and estimate their frequency in high-complexity DNA pools. In our five populations, 17% of all SNPs and 61% of structural variants are not available in the public databases. A large fraction of the novel variants show a limited geographic distribution, with 62% of the novel SNPs and 59% of novel structural variants being detected in only one of the populations. The large number of population-specific novel SNPs underscores the need for comprehensive sequencing of local populations in order to identify the causal variants of human traits.
Collapse
Affiliation(s)
- Ghazal Zaboli
- Department of Immunology, Genetics and Pathology, Rudbeck Laboratory, SciLifeLab Uppsala, Uppsala University, Uppsala, Sweden
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res 2011; 39:e132. [PMID: 21813454 PMCID: PMC3201884 DOI: 10.1093/nar/gkr599] [Citation(s) in RCA: 176] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial-binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to 'accept or reject the candidates' provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.
Collapse
Affiliation(s)
- Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 08540, USA.
| | | | | | | | | |
Collapse
|
40
|
Marroni F, Pinosio S, Di Centa E, Jurman I, Boerjan W, Felice N, Cattonaro F, Morgante M. Large-scale detection of rare variants via pooled multiplexed next-generation sequencing: towards next-generation Ecotilling. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2011; 67:736-45. [PMID: 21554453 DOI: 10.1111/j.1365-313x.2011.04627.x] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Common variants, such as those identified by genome-wide association scans, explain only a small proportion of trait variation. Growing evidence suggests that rare functional variants, which are usually missed by genome-wide association scans, play an important role in determining the phenotype. We used pooled multiplexed next-generation sequencing and a customized analysis workflow to detect mutations in five candidate genes for lignin biosynthesis in 768 pooled Populus nigra accessions. We identified a total of 36 non-synonymous single nucleotide polymorphisms, one of which causes a premature stop codon. The most common variant was estimated to be present in 672 of the 1536 tested chromosomes, while the rarest was estimated to occur only once in 1536 chromosomes. Comparison with individual Sanger sequencing in a selected sub-sample confirmed that variants are identified with high sensitivity and specificity, and that the variant frequency was estimated accurately. This proposed method for identification of rare polymorphisms allows accurate detection of variation in many individuals, and is cost-effective compared to individual sequencing.
Collapse
|
41
|
Bansal V, Tewhey R, LeProust EM, Schork NJ. Efficient and cost effective population resequencing by pooling and in-solution hybridization. PLoS One 2011; 6:e18353. [PMID: 21479135 PMCID: PMC3068187 DOI: 10.1371/journal.pone.0018353] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2011] [Accepted: 02/26/2011] [Indexed: 12/24/2022] Open
Abstract
High-throughput sequencing of targeted genomic loci in large populations is an effective approach for evaluating the contribution of rare variants to disease risk. We evaluated the feasibility of using in-solution hybridization-based target capture on pooled DNA samples to enable cost-efficient population sequencing studies. For this, we performed pooled sequencing of 100 HapMap samples across ∼ 600 kb of DNA sequence using the Illumina GAIIx. Using our accurate variant calling method for pooled sequence data, we were able to not only identify single nucleotide variants with a low false discovery rate (<1%) but also accurately detect short insertion/deletion variants. In addition, with sufficient coverage per individual in each pool (30-fold) we detected 97.2% of the total variants and 93.6% of variants below 5% in frequency. Finally, allele frequencies for single nucleotide variants (SNVs) estimated from the pooled data and the HapMap genotype data were tightly correlated (correlation coefficient > = 0.995).
Collapse
Affiliation(s)
- Vikas Bansal
- Scripps Genomic Medicine, Scripps Translational Science Institute, La Jolla, California, United States of America
| | - Ryan Tewhey
- Scripps Genomic Medicine, Scripps Translational Science Institute, La Jolla, California, United States of America
- Division of Biological Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Emily M. LeProust
- Genomics, Agilent Technologies, LLSU, Santa Clara, California, United States of America
| | - Nicholas J. Schork
- Scripps Genomic Medicine, Scripps Translational Science Institute, La Jolla, California, United States of America
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| |
Collapse
|
42
|
Lupton MK, Proitsi P, Danillidou M, Tsolaki M, Hamilton G, Wroe R, Pritchard M, Lord K, Martin BM, Kloszewska I, Soininen H, Mecocci P, Vellas B, Harold D, Hollingworth P, Lovestone S, Powell JF. Deep sequencing of the Nicastrin gene in pooled DNA, the identification of genetic variants that affect risk of Alzheimer's disease. PLoS One 2011; 6:e17298. [PMID: 21364883 PMCID: PMC3045431 DOI: 10.1371/journal.pone.0017298] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2010] [Accepted: 01/27/2011] [Indexed: 11/18/2022] Open
Abstract
Nicastrin is an obligatory component of the γ-secretase; the enzyme complex that leads to the production of Aβ fragments critically central to the pathogenesis of Alzheimer's disease (AD). Analyses of the effects of common variation in this gene on risk for late onset AD have been inconclusive. We investigated the effect of rare variation in the coding regions of the Nicastrin gene in a cohort of AD patients and matched controls using an innovative pooling approach and next generation sequencing. Five SNPs were identified and validated by individual genotyping from 311 cases and 360 controls. Association analysis identified a non-synonymous rare SNP (N417Y) with a statistically higher frequency in cases compared to controls in the Greek population (OR 3.994, CI 1.105–14.439, p = 0.035). This finding warrants further investigation in a larger cohort and adds weight to the hypothesis that rare variation explains some of genetic heritability still to be identified in Alzheimer's disease.
Collapse
Affiliation(s)
- Michelle K. Lupton
- MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, London, United Kingdom
- * E-mail:
| | - Petroula Proitsi
- MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, London, United Kingdom
| | - Makrina Danillidou
- 3rd Department of Neurology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Magda Tsolaki
- 3rd Department of Neurology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Gillian Hamilton
- Medical Genetics, Molecular Medicine Centre, Western General Hospital, University of Edinburgh, Edinburgh, United Kingdom
| | - Richard Wroe
- MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, London, United Kingdom
| | - Megan Pritchard
- MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, London, United Kingdom
| | - Kathryn Lord
- MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, London, United Kingdom
| | - Belinda M. Martin
- MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, London, United Kingdom
| | - Iwona Kloszewska
- Department of Old Age Psychiatry and Psychotic Disorders, Medical University of Lodz, Lodz, Poland
| | - Hilkka Soininen
- Department of Neurology, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
| | - Patrizia Mecocci
- Section of Gerontology and Geriatrics, Department of Clinical and Experimental Medicine, University of Perugia, Perugia, Italy
| | - Bruno Vellas
- Department of Internal and Geriatrics Medicine, Hôpitaux de Toulouse, Toulouse, France
| | - Denise Harold
- Department of Psychological Medicine and Neurology, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Paul Hollingworth
- Department of Psychological Medicine and Neurology, MRC Centre for Neuropsychiatric Genetics and Genomics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Simon Lovestone
- MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, London, United Kingdom
| | - John F. Powell
- MRC Centre for Neurodegeneration Research, Institute of Psychiatry, King's College London, London, United Kingdom
| |
Collapse
|
43
|
Lee JS, Choi M, Yan X, Lifton RP, Zhao H. On optimal pooling designs to identify rare variants through massive resequencing. Genet Epidemiol 2011; 35:139-47. [PMID: 21254222 PMCID: PMC3176340 DOI: 10.1002/gepi.20561] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2010] [Revised: 09/17/2010] [Accepted: 12/09/2010] [Indexed: 11/18/2022]
Abstract
The advent of next-generation sequencing technologies has facilitated the detection of rare variants. Despite the significant cost reduction, sequencing cost is still high for large-scale studies. In this article, we examine DNA pooling as a cost-effective strategy for rare variant detection. We consider the optimal number of individuals in a DNA pool to detect an allele with a specific minor allele frequency (MAF) under a given coverage depth and detection threshold. We found that the optimal number of individuals in a pool is indifferent to the MAF at the same coverage depth and detection threshold. In addition, when the individual contributions to each pool are equal, the total number of individuals across different pools required in an optimal design to detect a variant with a desired power is similar at different coverage depths. When the contributions are more variable, more individuals tend to be needed for higher coverage depths. Our study provides general guidelines on using DNA pooling for more cost-effective identifications of rare variants. Genet. Epidemiol. 35:139-147, 2011. © 2011 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Joon Sang Lee
- Department of Epidemiology and Public Health, Yale University, New Haven, Connecticut, USA.
| | | | | | | | | |
Collapse
|
44
|
PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS One 2011; 6:e15925. [PMID: 21253599 PMCID: PMC3017084 DOI: 10.1371/journal.pone.0015925] [Citation(s) in RCA: 395] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 11/30/2010] [Indexed: 11/19/2022] Open
Abstract
Recent statistical analyses suggest that sequencing of pooled samples provides a cost effective approach to determine genome-wide population genetic parameters. Here we introduce PoPoolation, a toolbox specifically designed for the population genetic analysis of sequence data from pooled individuals. PoPoolation calculates estimates of θWatterson, θπ, and Tajima's D that account for the bias introduced by pooling and sequencing errors, as well as divergence between species. Results of genome-wide analyses can be graphically displayed in a sliding window plot. PoPoolation is written in Perl and R and it builds on commonly used data formats. Its source code can be downloaded from http://code.google.com/p/popoolation/. Furthermore, we evaluate the influence of mapping algorithms, sequencing errors, and read coverage on the accuracy of population genetic parameter estimates from pooled data.
Collapse
|
45
|
|
46
|
Uil TG, Vellinga J, de Vrij J, van den Hengel SK, Rabelink MJWE, Cramer SJ, Eekels JJM, Ariyurek Y, van Galen M, Hoeben RC. Directed adenovirus evolution using engineered mutator viral polymerases. Nucleic Acids Res 2010; 39:e30. [PMID: 21138963 PMCID: PMC3061072 DOI: 10.1093/nar/gkq1258] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Adenoviruses (Ads) are the most frequently used viruses for oncolytic and gene therapy purposes. Most Ad-based vectors have been generated through rational design. Although this led to significant vector improvements, it is often hampered by an insufficient understanding of Ad’s intricate functions and interactions. Here, to evade this issue, we adopted a novel, mutator Ad polymerase-based, ‘accelerated-evolution’ approach that can serve as general method to generate or optimize adenoviral vectors. First, we site specifically substituted Ad polymerase residues located in either the nucleotide binding pocket or the exonuclease domain. This yielded several polymerase mutants that, while fully supportive of viral replication, increased Ad’s intrinsic mutation rate. Mutator activities of these mutants were revealed by performing deep sequencing on pools of replicated viruses. The strongest identified mutators carried replacements of residues implicated in ssDNA binding at the exonuclease active site. Next, we exploited these mutators to generate the genetic diversity required for directed Ad evolution. Using this new forward genetics approach, we isolated viral mutants with improved cytolytic activity. These mutants revealed a common mutation in a splice acceptor site preceding the gene for the adenovirus death protein (ADP). Accordingly, the isolated viruses showed high and untimely expression of ADP, correlating with a severe deregulation of E3 transcript splicing.
Collapse
Affiliation(s)
- Taco G Uil
- Department of Molecular Cell Biology, Leiden University Medical Center, Leiden, 2300 RC, The Netherlands
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Bansal V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 2010; 26:i318-24. [PMID: 20529923 PMCID: PMC2881398 DOI: 10.1093/bioinformatics/btq214] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Motivation: Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. Results: We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80–85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3–5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Availability: Implementation of this method is available at http://polymorphism.scripps.edu/∼vbansal/software/CRISP/ Contact:vbansal@scripps.edu
Collapse
Affiliation(s)
- Vikas Bansal
- Scripps Genomic Medicine, Scripps Translational Science Institute, La Jolla, CA 92037, USA.
| |
Collapse
|
48
|
Shental N, Amir A, Zuk O. Identification of rare alleles and their carriers using compressed se(que)nsing. Nucleic Acids Res 2010; 38:e179. [PMID: 20699269 PMCID: PMC2965256 DOI: 10.1093/nar/gkq675] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2010] [Revised: 06/20/2010] [Accepted: 07/19/2010] [Indexed: 11/29/2022] Open
Abstract
Identification of rare variants by resequencing is important both for detecting novel variations and for screening individuals for known disease alleles. New technologies enable low-cost resequencing of target regions, although it is still prohibitive to test more than a few individuals. We propose a novel pooling design that enables the recovery of novel or known rare alleles and their carriers in groups of individuals. The method is based on a Compressed Sensing (CS) approach, which is general, simple and efficient. CS allows the use of generic algorithmic tools for simultaneous identification of multiple variants and their carriers. We model the experimental procedure and show via computer simulations that it enables the recovery of rare alleles and their carriers in larger groups than were possible before. Our approach can also be combined with barcoding techniques to provide a feasible solution based on current resequencing costs. For example, when targeting a small enough genomic region (∼100 bp) and using only ∼10 sequencing lanes and ∼10 distinct barcodes per lane, one recovers the identity of 4 rare allele carriers out of a population of over 4000 individuals. We demonstrate the performance of our approach over several publicly available experimental data sets.
Collapse
Affiliation(s)
- Noam Shental
- Department of Computer Science, The Open University of Israel, Raanana 43107, Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel and Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Amnon Amir
- Department of Computer Science, The Open University of Israel, Raanana 43107, Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel and Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Or Zuk
- Department of Computer Science, The Open University of Israel, Raanana 43107, Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel and Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
49
|
Collins SC, Bray SM, Suhl JA, Cutler DJ, Coffee B, Zwick ME, Warren ST. Identification of novel FMR1 variants by massively parallel sequencing in developmentally delayed males. Am J Med Genet A 2010; 152A:2512-20. [PMID: 20799337 PMCID: PMC2946449 DOI: 10.1002/ajmg.a.33626] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Fragile X syndrome (FXS), the most common inherited form of developmental delay, is typically caused by CGG-repeat expansion in FMR1. However, little attention has been paid to sequence variants in FMR1. Through the use of pooled-template massively parallel sequencing, we identified 130 novel FMR1 sequence variants in a population of 963 developmentally delayed males without CGG-repeat expansion mutations. Among these, we identified a novel missense change, p.R138Q, which alters a conserved residue in the nuclear localization signal of FMRP. We have also identified three promoter mutations in this population, all of which significantly reduce in vitro levels of FMR1 transcription. Additionally, we identified 10 noncoding variants of possible functional significance in the introns and 3'-untranslated region of FMR1, including two predicted splice site mutations. These findings greatly expand the catalog of known FMR1 sequence variants and suggest that FMR1 sequence variants may represent an important cause of developmental delay.
Collapse
Affiliation(s)
- Stephen C. Collins
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Steven M. Bray
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Joshua A. Suhl
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - David J. Cutler
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Bradford Coffee
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Michael E. Zwick
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Stephen T. Warren
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
- Departments of Biochemistry and Pediatrics, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
50
|
Benaglio P, Rivolta C. Ultra high throughput sequencing in human DNA variation detection: a comparative study on the NDUFA3-PRPF31 region. PLoS One 2010; 5. [PMID: 20927379 PMCID: PMC2947511 DOI: 10.1371/journal.pone.0013071] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2010] [Accepted: 09/02/2010] [Indexed: 12/03/2022] Open
Abstract
Background Ultra high throughput sequencing (UHTS) technologies find an important application in targeted resequencing of candidate genes or of genomic intervals from genetic association studies. Despite the extraordinary power of these new methods, they are still rarely used in routine analysis of human genomic variants, in part because of the absence of specific standard procedures. The aim of this work is to provide human molecular geneticists with a tool to evaluate the best UHTS methodology for efficiently detecting DNA changes, from common SNPs to rare mutations. Methodology/Principal Findings We tested the three most widespread UHTS platforms (Roche/454 GS FLX Titanium, Illumina/Solexa Genome Analyzer II and Applied Biosystems/SOLiD System 3) on a well-studied region of the human genome containing many polymorphisms and a very rare heterozygous mutation located within an intronic repetitive DNA element. We identify the qualities and the limitations of each platform and describe some peculiarities of UHTS in resequencing projects. Conclusions/Significance When appropriate filtering and mapping procedures are applied UHTS technology can be safely and efficiently used as a tool for targeted human DNA variations detection. Unless particular and platform-dependent characteristics are needed for specific projects, the most relevant parameter to consider in mainstream human genome resequencing procedures is the cost per sequenced base-pair associated to each machine.
Collapse
Affiliation(s)
- Paola Benaglio
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
| | - Carlo Rivolta
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|