1
|
Abstract
Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as "metagenomics" or "community genomics". However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun ("random") sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis.
Collapse
Affiliation(s)
- Lauren Bragg
- Advanced Water Management Centre, The University of Queensland, St. Lucia, QLD, Australia
| | | |
Collapse
|
2
|
Zavodna M, Grueber CE, Gemmell NJ. Parallel tagged next-generation sequencing on pooled samples - a new approach for population genetics in ecology and conservation. PLoS One 2013; 8:e61471. [PMID: 23637841 PMCID: PMC3630221 DOI: 10.1371/journal.pone.0061471] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Accepted: 03/08/2013] [Indexed: 12/02/2022] Open
Abstract
Next-generation sequencing (NGS) on pooled samples has already been broadly applied in human medical diagnostics and plant and animal breeding. However, thus far it has been only sparingly employed in ecology and conservation, where it may serve as a useful diagnostic tool for rapid assessment of species genetic diversity and structure at the population level. Here we undertake a comprehensive evaluation of the accuracy, practicality and limitations of parallel tagged amplicon NGS on pooled population samples for estimating species population diversity and structure. We obtained 16S and Cyt b data from 20 populations of Leiopelma hochstetteri, a frog species of conservation concern in New Zealand, using two approaches - parallel tagged NGS on pooled population samples and individual Sanger sequenced samples. Data from each approach were then used to estimate two standard population genetic parameters, nucleotide diversity (π) and population differentiation (FST), that enable population genetic inference in a species conservation context. We found a positive correlation between our two approaches for population genetic estimates, showing that the pooled population NGS approach is a reliable, rapid and appropriate method for population genetic inference in an ecological and conservation context. Our experimental design also allowed us to identify both the strengths and weaknesses of the pooled population NGS approach and outline some guidelines and suggestions that might be considered when planning future projects.
Collapse
Affiliation(s)
- Monika Zavodna
- Centre for Reproduction and Genomics, Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Catherine E. Grueber
- Centre for Reproduction and Genomics, Department of Anatomy, University of Otago, Dunedin, New Zealand
- Department of Zoology, University of Otago, Dunedin, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin, New Zealand
| | - Neil J. Gemmell
- Centre for Reproduction and Genomics, Department of Anatomy, University of Otago, Dunedin, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, University of Otago, Dunedin, New Zealand
| |
Collapse
|
3
|
Roncarati R, Latronico MVG, Musumeci B, Aurino S, Torella A, Bang ML, Jotti GS, Puca AA, Volpe M, Nigro V, Autore C, Condorelli G. Unexpectedly low mutation rates in beta-myosin heavy chain and cardiac myosin binding protein genes in Italian patients with hypertrophic cardiomyopathy. J Cell Physiol 2011; 226:2894-900. [PMID: 21302287 PMCID: PMC3229838 DOI: 10.1002/jcp.22636] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Hypertrophic cardiomyopathy (HCM) is the most common genetic cardiac disease. Fourteen sarcomeric and sarcomere-related genes have been implicated in HCM etiology, those encoding β-myosin heavy chain (MYH7) and cardiac myosin binding protein C (MYBPC3) reported as the most frequently mutated: in fact, these account for around 50% of all cases related to sarcomeric gene mutations, which are collectively responsible for approximately 70% of all HCM cases. Here, we used denaturing high-performance liquid chromatography followed by bidirectional sequencing to screen the coding regions of MYH7 and MYBPC3 in a cohort (n = 125) of Italian patients presenting with HCM. We found 6 MHY7 mutations in 9/125 patients and 18 MYBPC3 mutations in 19/125 patients. Of the three novel MYH7 mutations found, two were missense, and one was a silent mutation; of the eight novel MYBPC3 mutations, one was a substitution, three were stop codons, and four were missense mutations. Thus, our cohort of Italian HCM patients did not harbor the high frequency of mutations usually found in MYH7 and MYBPC3. This finding, coupled to the clinical diversity of our cohort, emphasizes the complexity of HCM and the need for more inclusive investigative approaches in order to fully understand the pathogenesis of this disease.
Collapse
Affiliation(s)
- Roberta Roncarati
- Instituto di Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, Milan, Italy
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Jiang Q, Turner T, Sosa MX, Rakha A, Arnold S, Chakravarti A. Rapid and efficient human mutation detection using a bench-top next-generation DNA sequencer. Hum Mutat 2011; 33:281-9. [PMID: 21898659 DOI: 10.1002/humu.21602] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2011] [Accepted: 08/19/2011] [Indexed: 12/20/2022]
Abstract
Next-generation sequencing (NGS) technologies can be a boon to human mutation detection given their high throughput: consequently, many genes and samples may be simultaneously studied with high coverage for accurate detection of heterozygotes. In circumstances requiring the intensive study of a few genes, particularly in clinical applications, a rapid turn around is another desirable goal. To this end, we assessed the performance of the bench-top 454 GS Junior platform as an optimized solution for mutation detection by amplicon sequencing of three type 3 semaphorin genes SEMA3A, SEMA3C, and SEMA3D implicated in Hirschsprung disease (HSCR). We performed mutation detection on 39 PCR amplicons totaling 14,014 bp in 47 samples studied in pools of 12 samples. Each 10-hr run was able to generate ∼75,000 reads and ∼28 million high-quality bases at an average read length of 371 bp. The overall sequencing error was 0.26 changes per kb at a coverage depth of ≥20 reads. Altogether, 37 sequence variants were found in this study of which 10 were unique to HSCR patients. We identified five missense mutations in these three genes that may potentially be involved in the pathogenesis of HSCR and need to be studied in larger patient samples.
Collapse
Affiliation(s)
- Qian Jiang
- Center for Complex Disease Genomics, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | | | | | | | | | |
Collapse
|
5
|
Schlipf NA, Schüle R, Klimpe S, Karle KN, Synofzik M, Schicks J, Riess O, Schöls L, Bauer P. Amplicon-based high-throughput pooled sequencing identifies mutations in CYP7B1 and SPG7 in sporadic spastic paraplegia patients. Clin Genet 2011; 80:148-60. [DOI: 10.1111/j.1399-0004.2011.01715.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Fridjonsson O, Olafsson K, Tompsett S, Bjornsdottir S, Consuegra S, Knox D, de Leaniz CG, Magnusdottir S, Olafsdottir G, Verspoor E, Hjorleifsdottir S. Detection and mapping of mtDNA SNPs in Atlantic salmon using high throughput DNA sequencing. BMC Genomics 2011; 12:179. [PMID: 21473771 PMCID: PMC3079667 DOI: 10.1186/1471-2164-12-179] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2010] [Accepted: 04/07/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Approximately half of the mitochondrial genome inherent within 546 individual Atlantic salmon (Salmo salar) derived from across the species' North Atlantic range, was selectively amplified with a novel combination of standard PCR and pyro-sequencing in a single run using 454 Titanium FLX technology (Roche, 454 Life Sciences). A unique combination of barcoded primers and a partitioned sequencing plate was employed to designate each sequence read to its original sample. The sequence reads were aligned according to the S. salar mitochondrial reference sequence (NC_001960.1), with the objective of identifying single nucleotide polymorphisms (SNPs). They were validated if they met with the following three stringent criteria: (i) sequence reads were produced from both DNA strands; (ii) SNPs were confirmed in a minimum of 90% of replicate sequence reads; and (iii) SNPs occurred in more than one individual. RESULTS Pyrosequencing generated a total of 179,826,884 bp of data, and 10,765 of the total 10,920 S. salar sequences (98.6%) were assigned back to their original samples. The approach taken resulted in a total of 216 SNPs and 2 indels, which were validated and mapped onto the S. salar mitochondrial genome, including 107 SNPs and one indel not previously reported. An average of 27.3 sequence reads with a standard deviation of 11.7 supported each SNP per individual. CONCLUSION The study generated a mitochondrial SNP panel from a large sample group across a broad geographical area, reducing the potential for ascertainment bias, which has hampered previous studies. The SNPs identified here validate those identified in previous studies, and also contribute additional potentially informative loci for the future study of phylogeography and evolution in the Atlantic salmon. The overall success experienced with this novel application of HT sequencing of targeted regions suggests that the same approach could be successfully applied for SNP mining in other species.
Collapse
|
7
|
Camilli R, Bonnal RJP, Del Grosso M, Iacono M, Corti G, Rizzi E, Marchetti M, Mulas L, Iannelli F, Superti F, Oggioni MR, De Bellis G, Pantosti A. Complete genome sequence of a serotype 11A, ST62 Streptococcus pneumoniae invasive isolate. BMC Microbiol 2011; 11:25. [PMID: 21284853 PMCID: PMC3055811 DOI: 10.1186/1471-2180-11-25] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Accepted: 02/01/2011] [Indexed: 11/13/2022] Open
Abstract
Background Streptococcus pneumoniae is an important human pathogen representing a major cause of morbidity and mortality worldwide. We sequenced the genome of a serotype 11A, ST62 S. pneumoniae invasive isolate (AP200), that was erythromycin-resistant due to the presence of the erm(TR) determinant, and carried out analysis of the genome organization and comparison with other pneumococcal genomes. Results The genome sequence of S. pneumoniae AP200 is 2,130,580 base pair in length. The genome carries 2216 coding sequences (CDS), 56 tRNA, and 12 rRNA genes. Of the CDSs, 72.9% have a predicted biological known function. AP200 contains the pilus islet 2 and, although its phenotype corresponds to serotype 11A, it contains an 11D capsular locus. Chromosomal rearrangements resulting from a large inversion across the replication axis, and horizontal gene transfer events were observed. The chromosomal inversion is likely implicated in the rebalance of the chromosomal architecture affected by the insertions of two large exogenous elements, the erm(TR)-carrying Tn1806 and a functional prophage designated ϕSpn_200. Tn1806 is 52,457 bp in size and comprises 49 ORFs. Comparative analysis of Tn1806 revealed the presence of a similar genetic element or part of it in related species such as Streptococcus pyogenes and also in the anaerobic species Finegoldia magna, Anaerococcus prevotii and Clostridium difficile. The genome of ϕSpn_200 is 35,989 bp in size and is organized in 47 ORFs grouped into five functional modules. Prophages similar to ϕSpn_200 were found in pneumococci and in other streptococcal species, showing a high degree of exchange of functional modules. ϕSpn_200 viral particles have morphologic characteristics typical of the Siphoviridae family and are capable of infecting a pneumococcal recipient strain. Conclusions The sequence of S. pneumoniae AP200 chromosome revealed a dynamic genome, characterized by chromosomal rearrangements and horizontal gene transfers. The overall diversity of AP200 is driven mainly by the presence of the exogenous elements Tn1806 and ϕSpn_200 that show large gene exchanges with other genetic elements of different bacterial species. These genetic elements likely provide AP200 with additional genes, such as those conferring antibiotic-resistance, promoting its adaptation to the environment.
Collapse
Affiliation(s)
- Romina Camilli
- Department of Infectious, Parasitic and Immune-mediated Diseases, Istituto Superiore di Sanità, Rome, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Bartels MD, Hansen LH, Boye K, Sørensen SJ, Westh H. An unexpected location of the arginine catabolic mobile element (ACME) in a USA300-related MRSA strain. PLoS One 2011; 6:e16193. [PMID: 21283578 PMCID: PMC3026799 DOI: 10.1371/journal.pone.0016193] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2010] [Accepted: 12/16/2010] [Indexed: 11/28/2022] Open
Abstract
In methicillin resistant Staphylococcus aureus (MRSA), the arginine catabolic mobile element (ACME) was initially described in USA300 (t008-ST8) where it is located downstream of the staphylococcal cassette chromosome mec (SCCmec). A common health-care associated MRSA in Copenhagen, Denmark (t024-ST8) is clonally related to USA300 and is frequently PCR positive for the ACME specific arcA-gene. This study is the first to describe an ACME element upstream of the SCCmec in MRSA. By traditional SCCmec typing schemes, the SCCmec of t024-ST8 strain M1 carries SCCmec IVa, but full sequencing of the cassette revealed that the entire J3 region had no homology to published SCCmec IVa. Within the J3 region of M1 was a 1705 bp sequence only similar to a sequence in S. haemolyticus strain JCSC1435 and 2941 bps with no homology found in GenBank. In addition to the usual direct repeats (DR) at each extremity of SCCmec, M1 had two new DR between the orfX gene and the J3 region of the SCCmec. The region between the orfX DR (DR1) and DR2 contained the ccrAB4 genes. An ACME II-like element was located between DR2 and DR3. The entire 26,468 bp sequence between DR1 and DR3 was highly similar to parts of the ACME composite island of S. epidermidis strain ATCC12228. Sequencing of an ACME negative t024-ST8 strain (M299) showed that DR1 and the sequence between DR1 and DR3 was missing. The finding of a mobile ACME II-like element inserted downstream of orfX and upstream of SCCmec indicates a novel recombination between staphylococcal species.
Collapse
|
9
|
Mitochondrial DNA variant discovery and evaluation in human Cardiomyopathies through next-generation sequencing. PLoS One 2010; 5:e12295. [PMID: 20808834 PMCID: PMC2924892 DOI: 10.1371/journal.pone.0012295] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 07/26/2010] [Indexed: 11/19/2022] Open
Abstract
Mutations in mitochondrial DNA (mtDNA) may cause maternally-inherited cardiomyopathy and heart failure. In homoplasmy all mtDNA copies contain the mutation. In heteroplasmy there is a mixture of normal and mutant copies of mtDNA. The clinical phenotype of an affected individual depends on the type of genetic defect and the ratios of mutant and normal mtDNA in affected tissues. We aimed at determining the sensitivity of next-generation sequencing compared to Sanger sequencing for mutation detection in patients with mitochondrial cardiomyopathy. We studied 18 patients with mitochondrial cardiomyopathy and two with suspected mitochondrial disease. We “shotgun” sequenced PCR-amplified mtDNA and multiplexed using a single run on Roche's 454 Genome Sequencer. By mapping to the reference sequence, we obtained 1,300× average coverage per case and identified high-confidence variants. By comparing these to >400 mtDNA substitution variants detected by Sanger, we found 98% concordance in variant detection. Simulation studies showed that >95% of the homoplasmic variants were detected at a minimum sequence coverage of 20× while heteroplasmic variants required >200× coverage. Several Sanger “misses” were detected by 454 sequencing. These included the novel heteroplasmic 7501T>C in tRNA serine 1 in a patient with sudden cardiac death. These results support a potential role of next-generation sequencing in the discovery of novel mtDNA variants with heteroplasmy below the level reliably detected with Sanger sequencing. We hope that this will assist in the identification of mtDNA mutations and key genetic determinants for cardiomyopathy and mitochondrial disease.
Collapse
|
10
|
Reliable resequencing of the human dystrophin locus by universal long polymerase chain reaction and massive pyrosequencing. Anal Biochem 2010; 406:176-84. [PMID: 20670611 DOI: 10.1016/j.ab.2010.07.022] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Revised: 07/19/2010] [Accepted: 07/20/2010] [Indexed: 01/07/2023]
Abstract
The X-linked dystrophin gene is well known for its involvement in Duchenne/Becker muscular dystrophies and for its exceptional megabase size. This locus at Xp21 is prone to frequent random molecular changes, including large deletions and duplications, but also smaller variations. To cope with such huge sequence analysis requirements in forthcoming diagnostic applications, we employed the power of the parallel 454 GS-FLX pyrosequencer to the dystrophin locus. We enriched the genomic region of interest by the robust amplification of 62 fragments under universal conditions by the long-PCR protocol yielding 244,707 bp of sequence. Pooled PCR products were fragmented and used for library preparation and DNA sequencing. To evaluate the entire procedure we analyzed four male DNA samples for sequence coverage and accuracy in DNA sequence variation and for any potential bias. We identified 562 known variations and 55 additional variants not yet reported, among which we detected a causative Arg1844Stop mutation in one sample. Sanger sequencing confirmed all changes. Unexpectedly, only 3 x coverage was sufficient for 99.9993% accuracy. Our results show that long PCR combined to massive pyrosequencing is very reliable for the analysis of the biggest gene of the human genome and open the doors to other demanding applications in molecular diagnostics.
Collapse
|
11
|
Galan M, Guivier E, Caraux G, Charbonnel N, Cosson JF. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies. BMC Genomics 2010; 11:296. [PMID: 20459828 PMCID: PMC2876125 DOI: 10.1186/1471-2164-11-296] [Citation(s) in RCA: 161] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2009] [Accepted: 05/11/2010] [Indexed: 11/10/2022] Open
Abstract
Background High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. Results DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. Conclusions This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives for the study of evolutionary and functional genetics of highly polymorphic genes like major histocompatibility complex genes in vertebrates or loci regulating self-compatibility in plants. Important applications in biomedical research will include the detection of individual variation in disease susceptibility. Similarly, agronomy will benefit from this approach, through the study of genes implicated in productivity or disease susceptibility traits.
Collapse
Affiliation(s)
- Maxime Galan
- INRA EFPA, UMR CBGP (INRA/IRD/Cirad/Montpellier SupAgro), Campus international de Baillarguet, CS 30016, F-34988 Montferrier-sur-Lez cedex, France.
| | | | | | | | | |
Collapse
|
12
|
Williams LM, Ma X, Boyko AR, Bustamante CD, Oleksiak MF. SNP identification, verification, and utility for population genetics in a non-model genus. BMC Genet 2010; 11:32. [PMID: 20433726 PMCID: PMC2874759 DOI: 10.1186/1471-2156-11-32] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 04/30/2010] [Indexed: 02/06/2023] Open
Abstract
Background By targeting SNPs contained in both coding and non-coding areas of the genome, we are able to identify genetic differences and characterize genome-wide patterns of variation among individuals, populations and species. We investigated the utility of 454 sequencing and MassARRAY genotyping for population genetics in natural populations of the teleost, Fundulus heteroclitus as well as closely related Fundulus species (F. grandis, F. majalis and F. similis). Results We used 454 pyrosequencing and MassARRAY genotyping technology to identify and type 458 genome-wide SNPs and determine genetic differentiation within and between populations and species of Fundulus. Specifically, pyrosequencing identified 96 putative SNPs across coding and non-coding regions of the F. heteroclitus genome: 88.8% were verified as true SNPs with MassARRAY. Additionally, putative SNPs identified in F. heteroclitus EST sequences were verified in most (86.5%) F. heteroclitus individuals; fewer were genotyped in F. grandis (74.4%), F. majalis (72.9%), and F. similis (60.7%) individuals. SNPs were polymorphic and showed latitudinal clinal variation separating northern and southern populations and established isolation by distance in F. heteroclitus populations. In F. grandis, SNPs were less polymorphic but still established isolation by distance. Markers differentiated species and populations. Conclusions In total, these approaches were used to quickly determine differences within the Fundulus genome and provide markers for population genetic studies.
Collapse
Affiliation(s)
- Larissa M Williams
- Rosenstiel School of Marine and Atmospheric Sciences, University of Miami, 4600 Rickenbacker Causeway, Miami, FL 33149, USA
| | | | | | | | | |
Collapse
|
13
|
Külheim C, Yeoh SH, Maintz J, Foley WJ, Moran GF. Comparative SNP diversity among four Eucalyptus species for genes from secondary metabolite biosynthetic pathways. BMC Genomics 2009; 10:452. [PMID: 19775472 PMCID: PMC2760585 DOI: 10.1186/1471-2164-10-452] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Accepted: 09/24/2009] [Indexed: 11/21/2022] Open
Abstract
Background There is little information about the DNA sequence variation within and between closely related plant species. The combination of re-sequencing technologies, large-scale DNA pools and availability of reference gene sequences allowed the extensive characterisation of single nucleotide polymorphisms (SNPs) in genes of four biosynthetic pathways leading to the formation of ecologically relevant secondary metabolites in Eucalyptus. With this approach the occurrence and patterns of SNP variation for a set of genes can be compared across different species from the same genus. Results In a single GS-FLX run, we sequenced over 103 Mbp and assembled them to approximately 50 kbp of reference sequences. An average sequencing depth of 315 reads per nucleotide site was achieved for all four eucalypt species, Eucalyptus globulus, E. nitens, E. camaldulensis and E. loxophleba. We sequenced 23 genes from 1,764 individuals and discovered 8,631 SNPs across the species, with about 1.5 times as many SNPs per kbp in the introns compared to exons. The exons of the two closely related species (E. globulus and E. nitens) had similar numbers of SNPs at synonymous and non-synonymous sites. These species also had similar levels of SNP diversity, whereas E. camaldulensis and E. loxophleba had much higher SNP diversity. Neither the pathway nor the position in the pathway influenced gene diversity. The four species share between 20 and 43% of the SNPs in these genes. Conclusion By using conservative statistical detection methods, we were confident about the validity of each SNP. With numerous individuals sampled over the geographical range of each species, we discovered one SNP in every 33 bp for E. nitens and one in every 31 bp in E. globulus. In contrast, the more distantly related species contained more SNPs: one in every 16 bp for E. camaldulensis and one in 17 bp for E. loxophleba, which is, to the best of our knowledge, the highest frequency of SNPs described in woody plant species.
Collapse
Affiliation(s)
- Carsten Külheim
- Research School of Biology, Australian National University, 116 Daley Road, Canberra, Australia.
| | | | | | | | | |
Collapse
|
14
|
Rohlin A, Wernersson J, Engwall Y, Wiklund L, Björk J, Nordling M. Parallel sequencing used in detection of mosaic mutations: comparison with four diagnostic DNA screening techniques. Hum Mutat 2009; 30:1012-20. [PMID: 19347965 DOI: 10.1002/humu.20980] [Citation(s) in RCA: 130] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We have made an evaluation of mutation detection techniques for their abilities to detect mosaic mutations. In this study, Sanger sequencing, single-strand conformation polymorphism (SSCP)/heteroduplex analysis (HD), protein truncation test (PTT), and denaturating high-performance liquid chromatography (DHPLC) were compared with parallel sequencing. In total DNA samples from nine patients were included in this study. Mosaic mutations were artificially constructed from seven of these samples, which were from heterozygote mutation carriers with the mutant allele present at 50%. The mutations analyzed were as follows: c.646C>T, c.2626C>T, c.2828C>A, c.1817_1818insA, c.2788dupA, c.416_419delAAGA, and c.607delC in the APC gene. The lowest degree of mutant alleles detected with SSCP/HD and DHPLC varied between 5% and 25%, and between 15% and 50% for Sanger sequencing. Three of the mutations were analyzed with PTT with considerable variations in detection levels (from 10 to 100%). Using parallel sequencing a detection frequency down to 1% was reached, but to achieve this high sensitivity sufficient coverage was required. Two patients with natural mosaic mutations were also included in this study. These two mutations had previously been identified with Sanger sequencing (NF2 c.1026_1027delGA) and SSCP/HD (APC c.2700_2701delTC). In conclusion, all the evaluated methods are applicable for mosaic mutation screening even though combinations of the conventional methods should be used to reach an adequate sensitivity. Sanger sequencing alone is not sensitive enough to detect low mosaic levels. Parallel sequencing seems to be the ultimate choice but the possibilities to use this technique is today limited by its complexity, economics, and availability of instruments.
Collapse
Affiliation(s)
- Anna Rohlin
- Department of Clinical Genetics, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, Sahlgrenska University Hospital, Gothenburg, Sweden
| | | | | | | | | | | |
Collapse
|
15
|
Bundock PC, Eliott FG, Ablett G, Benson AD, Casu RE, Aitken KS, Henry RJ. Targeted single nucleotide polymorphism (SNP) discovery in a highly polyploid plant species using 454 sequencing. PLANT BIOTECHNOLOGY JOURNAL 2009; 7:347-54. [PMID: 19386042 DOI: 10.1111/j.1467-7652.2009.00401.x] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Discovering single nucleotide polymorphisms (SNPs) in specific genes in a heterozygous polyploid plant species, such as sugarcane, is challenging because of the presence of a large number of homologues. To discover SNPs for mapping genes of interest, 454 sequencing of 307 polymerase chain reaction (PCR) amplicons (> 59 kb of sequence) was undertaken. One region of a four-gasket sequencing run, on a 454 Genome Sequencer FLX, was used for pooled PCR products amplified from each parent of a quantitative trait locus (QTL) mapping population (IJ76-514 x Q165). The sequencing yielded 96,755 (IJ76-514) and 86,241 (Q165) sequences with perfect matches to a PCR primer used in amplification, with an average sequence depth of approximately 300 and an average read length of 220 bases. Further analysis was carried out on amplicons whose sequences clustered into a single contig using an identity of 80% with the program cap3. In the more polymorphic sugarcane parent (Q165), 94% of amplicons (227/242) had evidence of a reliable SNP--an average of one every 35 bases. Significantly fewer SNPs were found in the pure Saccharum officinarum parent--with one SNP every 58 bases and SNPs in 86% (213/247) of amplicons. Using automatic SNP detection, 1632 SNPs were detected in Q165 sequences and 1013 in IJ76-514. From 225 candidate SNP sites tested, 209 (93%) were validated as polymorphic using the Sequenom MassARRAY system. Amplicon re-sequencing using the 454 system enables cost-effective SNP discovery that can be targeted to genes of interest and is able to perform in the highly challenging area of polyploid genomes.
Collapse
Affiliation(s)
- Peter C Bundock
- Co-operative Research Centre for Sugar Industry Innovation through Biotechnology, Southern Cross University, Lismore, NSW 2480, Australia.
| | | | | | | | | | | | | |
Collapse
|