1
|
Abstract 449: A standard operating procedure for the curation of gene fusions. Cancer Res 2021. [DOI: 10.1158/1538-7445.am2021-449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Despite the well-established role of recurrent gene fusions as oncogenic drivers, current practices for characterizing and interpreting gene fusion events in clinical testing and in biomedical literature are inconsistent. From the conceptual definition of gene fusions to the salient elements that characterize these alterations, a lack of community-driven standards for the curation of gene fusions has resulted in a disparate landscape of fusion representations and supporting tools. Consequently, the evidence-based clinical evaluation of gene fusions requires extensive expert review for accurate interpretation of observed gene fusions with respect to putative evidence from biomedical literature. Furthermore, the lack of these standards inhibits the interoperability of tools, resources, and pipelines - impeding data sharing and downstream utility.To address these challenges, a cross-consortia initiative between the Variant Interpretation for Cancer Consortium and ClinGen was formed to develop a standard operating procedure (SOP) for the curation of gene fusions. The SOP is under development by an international and diverse set of experts in the representation, detection, and clinical interpretation of gene fusions. Participating stakeholders across academic, government, and industry sectors showcased challenges and solutions, and participated in community surveys and discussions to define and develop the SOP for this diverse class of alterations.An initial result of this effort was the precise molecular definition of genomic events and features constituting gene fusions. We distinguish these from similar but distinct classes of structural alterations through clinically-relevant examples. Next, we discuss our findings on community practices around the description and evaluation of gene fusions. We provide our recommendations for characterization and representation of gene fusions from these practices, and compare these recommendations to existing variant representation standards and formats (e.g. HGVS variant nomenclature). We also discuss the concurrent application of formats for standardized human- and machine-readable representations of gene fusion events.We conclude with discussion of the salient elements to enable rapid, scalable, and consistent evaluation of fusions curated from the biomedical literature. Recommendations are provided for the standardized capture of these elements to enable both intuitive and precise characterization of this diverse class of alterations in clinical reporting and literature. In summary, we provide a clinical-practice driven framework and nomenclature for gene fusions, including recommendations for human readability, computational precision, and data integrity within the SOP. This work is a substantial advancement towards standardized communication, investigation, and sharing of gene fusion data across clinical and research domains and specialties.
Citation Format: Alex H. Wagner, Ioannis S. Vlachos, Dmitriy Sonkin, Panieh Terraf, Chimene Kesserwan, Andrea Sboner, Thomas Coard, Christian Reich, Deborah I. Ritter, Peter Horak, Ying S. Zou, Anna Tanska, Aaron M. Berlin, Anna Lu, Daniel Cameron, Heather E. Williams, Wan-Hsin Lin, Gokce Toruner, Arpad Danos, Jason Saliba, Huiling Xu, Xinjie Xu, Georgina Ryland, Michele Ceccarelli, Liying Zhang, Sarah Rapisardo, Catherine Rehder, Xuelu Liu, Aparna Pallavajjala, Nicole Park, Laveniya Satgunaseelan, Kristy Lee, Jie Liu, Obi Griffith, Robert R. Freimuth, Albrecht Stenzinger, Linda B. Baughn, Michael Baudis, Jennifer Lee, Marilyn Li, Angshumoy Roy, Gordana Raca. A standard operating procedure for the curation of gene fusions [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 449.
Collapse
|
2
|
Simultaneous Identification of Cell of Origin, Translocations, and Hotspot Mutations in Diffuse Large B-Cell Lymphoma Using a Single RNA-Sequencing Assay. Am J Clin Pathol 2021; 155:748-754. [PMID: 33258912 DOI: 10.1093/ajcp/aqaa185] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Diffuse large B-cell lymphoma (DLBCL) is an aggressive non-Hodgkin lymphoma with a heterogenous genetic landscape that can require multiple assays to characterize. We reviewed a 1-step RNA-based assay to determine cell of origin (COO), detect translocations, and identify mutations and to assess the role of the assay in diagnosis. METHODS Using a single custom Archer FusionPlex Lymphoma panel, we performed anchored multiplex polymerase chain reaction-based RNA sequencing on 41 cases of de novo DLBCL. Each case was subclassified by COO, and gene fusions and hotspot mutations were identified. The findings were then compared with COO classification by the Hans immunohistochemical algorithm and NanoString technology, cytogenetics, and fluorescence in situ hybridization results. RESULTS Concordant COO classification by the FusionPlex panel and NanoString was observed in 35 of 41 cases (85.3%), with NanoString and Hans concordant in 33 of 41 cases (80.5%) and FusionPlex and Hans concordant in 33 of 41 cases (80.5%). The FusionPlex assay also detected 6 of 11 BCL6 translocations (4 cryptic), 2 of 3 BCL2 translocations, and 2 of 4 MYC translocations. Mutations were detected in lymphoma-related genes in 24 of 41 cases. CONCLUSION This FusionPlex assay offers a single method for COO classification, mutation detection, and identification of important translocations in DLBCL. Although not replacing traditional testing, it could offer useful data when limited tissue is available.
Collapse
|
3
|
The application of next-generation sequencing-based molecular diagnostics in endometrial stromal sarcoma. Histopathology 2016; 69:551-9. [PMID: 26990025 DOI: 10.1111/his.12966] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 03/12/2016] [Indexed: 12/31/2022]
Abstract
AIMS Endometrial stromal sarcomas (ESSs) are divided into low-grade and high-grade subtypes, with the latter showing more aggressive clinical behaviour. Although histology and immunophenotype can aid in the diagnosis of these tumours, genetic studies can provide additional diagnostic insights, as low-grade ESSs frequently harbour fusions involving JAZF1/SUZ12 and/or JAZF1/PHF1, whereas high-grade ESSs are defined by YWHAE-NUTM2A/B fusions. The aim of this study was to evaluate the utility of a next-generation sequencing (NGS)-based assay in identifying ESS fusions in archival formalin-fixed paraffin-embedded tumour samples. METHODS AND RESULTS We applied an NGS-based fusion transcript detection assay (Archer FusionPlex Sarcoma Panel) that targets YWHAE and JAZF1 fusions in a series of low-grade ESSs (n = 11) and high-grade ESSs (n = 5) that were previously confirmed to harbour genetic rearrangements by fluorescence in-situ hybridization (FISH) and/or reverse transcription polymerase chain reaction (RT-PCR) analyses. The fusion assay identified junctional fusion transcript sequences that corresponded to the known FISH/RT-PCR results in all cases. Four low-grade ESSs harboured JAZF1-PHF1 fusions with different junctional sequences, and all were correctly identified because of the open-ended nature of the assay design, using anchored multiplex polymerase chain reaction. Seven non-ESS sarcomas were also included as negative controls, and no strong ESS fusion candidates were identified in these cases. CONCLUSIONS Our findings demonstrate good sensitivity and specificity of an NGS-based gene fusion assay in the detection of ESS fusion transcripts.
Collapse
|
4
|
Erratum: Corrigendum: The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat Genet 2016; 48:700. [DOI: 10.1038/ng0616-700c] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
5
|
Clinical Sequencing Uncovers Origins and Evolution of Lassa Virus. Cell 2016; 162:738-50. [PMID: 26276630 DOI: 10.1016/j.cell.2015.07.020] [Citation(s) in RCA: 188] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Revised: 04/26/2015] [Accepted: 06/12/2015] [Indexed: 12/25/2022]
Abstract
The 2013-2015 West African epidemic of Ebola virus disease (EVD) reminds us of how little is known about biosafety level 4 viruses. Like Ebola virus, Lassa virus (LASV) can cause hemorrhagic fever with high case fatality rates. We generated a genomic catalog of almost 200 LASV sequences from clinical and rodent reservoir samples. We show that whereas the 2013-2015 EVD epidemic is fueled by human-to-human transmissions, LASV infections mainly result from reservoir-to-human infections. We elucidated the spread of LASV across West Africa and show that this migration was accompanied by changes in LASV genome abundance, fatality rates, codon adaptation, and translational efficiency. By investigating intrahost evolution, we found that mutations accumulate in epitopes of viral surface proteins, suggesting selection for immune escape. This catalog will serve as a foundation for the development of vaccines and diagnostics. VIDEO ABSTRACT.
Collapse
|
6
|
Abstract
Motivation: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. Results: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat’s extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (http://www.naked-mole-rat.org), featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species. Availability and implementation: The Naked Mole Rat Genome Resource is freely available online at http://www.naked-mole-rat.org. This resource is open source and the source code is available at https://github.com/maglab/naked-mole-rat-portal. Contact:jp@senescence.info
Collapse
|
7
|
Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat Methods 2013. [PMID: 23685885 DOI: 10.1038/nmeth.248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
RNA-seq is an effective method for studying the transcriptome, but it can be difficult to apply to scarce or degraded RNA from fixed clinical samples, rare cell populations or cadavers. Recent studies have proposed several methods for RNA-seq of low-quality and/or low-quantity samples, but the relative merits of these methods have not been systematically analyzed. Here we compare five such methods using metrics relevant to transcriptome annotation, transcript discovery and gene expression. Using a single human RNA sample, we constructed and sequenced ten libraries with these methods and compared them against two control libraries. We found that the RNase H method performed best for chemically fragmented, low-quality RNA, and we confirmed this through analysis of actual degraded samples. RNase H can even effectively replace oligo(dT)-based methods for standard RNA-seq. SMART and NuGEN had distinct strengths for measuring low-quantity RNA. Our analysis allows biologists to select the most suitable methods and provides a benchmark for future method development.
Collapse
|
8
|
The African coelacanth genome provides insights into tetrapod evolution. Nature 2013; 496:311-6. [PMID: 23598338 PMCID: PMC3633110 DOI: 10.1038/nature12027] [Citation(s) in RCA: 464] [Impact Index Per Article: 42.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Accepted: 02/20/2013] [Indexed: 01/28/2023]
Abstract
It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
Collapse
|
9
|
Comparative genomics of a plant-pathogenic fungus, Pyrenophora tritici-repentis, reveals transduplication and the impact of repeat elements on pathogenicity and population divergence. G3 (BETHESDA, MD.) 2013; 3:41-63. [PMID: 23316438 PMCID: PMC3538342 DOI: 10.1534/g3.112.004044] [Citation(s) in RCA: 129] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 11/02/2012] [Indexed: 12/31/2022]
Abstract
Pyrenophora tritici-repentis is a necrotrophic fungus causal to the disease tan spot of wheat, whose contribution to crop loss has increased significantly during the last few decades. Pathogenicity by this fungus is attributed to the production of host-selective toxins (HST), which are recognized by their host in a genotype-specific manner. To better understand the mechanisms that have led to the increase in disease incidence related to this pathogen, we sequenced the genomes of three P. tritici-repentis isolates. A pathogenic isolate that produces two known HSTs was used to assemble a reference nuclear genome of approximately 40 Mb composed of 11 chromosomes that encode 12,141 predicted genes. Comparison of the reference genome with those of a pathogenic isolate that produces a third HST, and a nonpathogenic isolate, showed the nonpathogen genome to be more diverged than those of the two pathogens. Examination of gene-coding regions has provided candidate pathogen-specific proteins and revealed gene families that may play a role in a necrotrophic lifestyle. Analysis of transposable elements suggests that their presence in the genome of pathogenic isolates contributes to the creation of novel genes, effector diversification, possible horizontal gene transfer events, identified copy number variation, and the first example of transduplication by DNA transposable elements in fungi. Overall, comparative analysis of these genomes provides evidence that pathogenicity in this species arose through an influx of transposable elements, which created a genetically flexible landscape that can easily respond to environmental changes.
Collapse
|
10
|
Complete viral RNA genome sequencing of ultra-low copy samples by sequence-independent amplification. Nucleic Acids Res 2012; 41:e13. [PMID: 22962364 PMCID: PMC3592391 DOI: 10.1093/nar/gks794] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
RNA viruses are the causative agents for AIDS, influenza, SARS, and other serious health threats. Development of rapid and broadly applicable methods for complete viral genome sequencing is highly desirable to fully understand all aspects of these infectious agents as well as for surveillance of viral pandemic threats and emerging pathogens. However, traditional viral detection methods rely on prior sequence or antigen knowledge. In this study, we describe sequence-independent amplification for samples containing ultra-low amounts of viral RNA coupled with Illumina sequencing and de novo assembly optimized for viral genomes. With 5 million reads, we capture 96 to 100% of the viral protein coding region of HIV, respiratory syncytial and West Nile viral samples from as little as 100 copies of viral RNA. The methods presented here are scalable to large numbers of samples and capable of generating full or near full length viral genomes from clone and clinical samples with low amounts of viral RNA, without prior sequence information and in the presence of substantial host contamination.
Collapse
|
11
|
Abstract
Exceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been “finished” at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Cost and time requirements are thus dramatically reduced.
Collapse
|
12
|
Abstract
Eliminating the bacterial cloning step has been a major factor in the vastly improved efficiency of massively parallel sequencing approaches. However, this also has made it a technical challenge to produce the modern equivalent of the Fosmid- or BAC-end sequences that were crucial for assembling and analyzing complex genomes during the Sanger-based sequencing era. To close this technology gap, we developed Fosill, a method for converting Fosmids to Illumina-compatible jumping libraries. We constructed Fosmid libraries in vectors with Illumina primer sequences and specific nicking sites flanking the cloning site. Our family of pFosill vectors allows multiplex Fosmid cloning of end-tagged genomic fragments without physical size selection and is compatible with standard and multiplex paired-end Illumina sequencing. To excise the bulk of each cloned insert, we introduced two nicks in the vector, translated them into the inserts, and cleaved them. Recircularization of the vector via coligation of insert termini followed by inverse PCR generates a jumping library for paired-end sequencing with 101-base reads. The yield of unique Fosmid-sized jumps is sufficiently high, and the background of short, incorrectly spaced and chimeric artifacts sufficiently low, to enable applications such as mapping of structural variation and scaffolding of de novo assemblies. We demonstrate the power of Fosill to map genome rearrangements in a cancer cell line and identified three fusion genes that were corroborated by RNA-seq data. Our Fosill-powered assembly of the mouse genome has an N50 scaffold length of 17.0 Mb, rivaling the connectivity (16.9 Mb) of the Sanger-sequencing based draft assembly.
Collapse
|
13
|
Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog 2012; 8:e1002529. [PMID: 22412369 PMCID: PMC3297584 DOI: 10.1371/journal.ppat.1002529] [Citation(s) in RCA: 287] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2011] [Accepted: 12/27/2011] [Indexed: 12/20/2022] Open
Abstract
Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained regions of the virus in order to ensure the maintenance of immunodominant CD8 responses and the sustained decline of early viremia. The ability of HIV-1 and other highly variable pathogens to rapidly mutate to escape vaccine-induced immune responses represents a major hurdle to the development of effective vaccines to these highly persistent pathogens. Application of next-generation or deep sequencing technologies to the study of host pathogens could significantly improve our understanding of the mechanisms by which these pathogens subvert host immunity, and aid in the development of novel vaccines and therapeutics. Here, we developed a 454 deep sequencing approach to enable the sensitive detection of low-frequency viral variants across the entire HIV-1 genome. When applied to the acute phase of HIV-1 infection we observed that the majority of early, low frequency mutations represented viral adaptations to host cellular immune responses, evidence of strong host immunity developing during the early decline of peak viral load. Rapid viral escape from the most dominant immune responses however correlated with loss of this initial viral control, suggestive of the importance of mounting immune responses against more conserved regions of the virus. These data provide a greater understanding of the early evolutionary events subverting the ability of host immune responses to control early HIV-1 replication, yielding important insight into the design of more effective vaccine strategies.
Collapse
|
14
|
Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ Microbiol 2011; 12:3035-56. [PMID: 20662890 PMCID: PMC3037559 DOI: 10.1111/j.1462-2920.2010.02280.x] [Citation(s) in RCA: 245] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
T4-like myoviruses are ubiquitous, and their genes are among the most abundant documented in ocean systems. Here we compare 26 T4-like genomes, including 10 from non-cyanobacterial myoviruses, and 16 from marine cyanobacterial myoviruses (cyanophages) isolated on diverse Prochlorococcus or Synechococcus hosts. A core genome of 38 virion construction and DNA replication genes was observed in all 26 genomes, with 32 and 25 additional genes shared among the non-cyanophage and cyanophage subsets, respectively. These hierarchical cores are highly syntenic across the genomes, and sampled to saturation. The 25 cyanophage core genes include six previously described genes with putative functions (psbA, mazG, phoH, hsp20, hli03, cobS), a hypothetical protein with a potential phytanoyl-CoA dioxygenase domain, two virion structural genes, and 16 hypothetical genes. Beyond previously described cyanophage-encoded photosynthesis and phosphate stress genes, we observed core genes that may play a role in nitrogen metabolism during infection through modulation of 2-oxoglutarate. Patterns among non-core genes that may drive niche diversification revealed that phosphorus-related gene content reflects source waters rather than host strain used for isolation, and that carbon metabolism genes appear associated with putative mobile elements. As well, phages isolated on Synechococcus had higher genome-wide %G+C and often contained different gene subsets (e.g. petE, zwf, gnd, prnA, cpeT) than those isolated on Prochlorococcus. However, no clear diagnostic genes emerged to distinguish these phage groups, suggesting blurred boundaries possibly due to cross-infection. Finally, genome-wide comparisons of both diverse and closely related, co-isolated genomes provide a locus-to-locus variability metric that will prove valuable for interpreting metagenomic data sets.
Collapse
|
15
|
Abstract
The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus--occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.
Collapse
|
16
|
A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol 2011; 12:R1. [PMID: 21205303 PMCID: PMC3091298 DOI: 10.1186/gb-2011-12-1-r1] [Citation(s) in RCA: 331] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Revised: 09/25/2010] [Accepted: 01/04/2011] [Indexed: 11/24/2022] Open
Abstract
Genome targeting methods enable cost-effective capture of specific subsets of the genome for sequencing. We present here an automated, highly scalable method for carrying out the Solution Hybrid Selection capture approach that provides a dramatic increase in scale and throughput of sequence-ready libraries produced. Significant process improvements and a series of in-process quality control checkpoints are also added. These process improvements can also be used in a manual version of the protocol.
Collapse
|
17
|
Abstract
The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.
Collapse
|
18
|
Analysis of high-throughput sequencing and annotation strategies for phage genomes. PLoS One 2010; 5:e9083. [PMID: 20140207 PMCID: PMC2816706 DOI: 10.1371/journal.pone.0009083] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2009] [Accepted: 01/18/2010] [Indexed: 11/18/2022] Open
Abstract
Background Bacterial viruses (phages) play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. Methodology/Principal Findings To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles), and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL) or of a whole genome shotgun library (WGSL), or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. Conclusions/Significance These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.
Collapse
|
19
|
Abstract
Background Single-cell genome sequencing has the potential to allow the in-depth exploration of the vast genetic diversity found in uncultured microbes. We used the marine cyanobacterium Prochlorococcus as a model system for addressing important challenges facing high-throughput whole genome amplification (WGA) and complete genome sequencing of individual cells. Methodology/Principal Findings We describe a pipeline that enables single-cell WGA on hundreds of cells at a time while virtually eliminating non-target DNA from the reactions. We further developed a post-amplification normalization procedure that mitigates extreme variations in sequencing coverage associated with multiple displacement amplification (MDA), and demonstrated that the procedure increased sequencing efficiency and facilitated genome assembly. We report genome recovery as high as 99.6% with reference-guided assembly, and 95% with de novo assembly starting from a single cell. We also analyzed the impact of chimera formation during MDA on de novo assembly, and discuss strategies to minimize the presence of incorrectly joined regions in contigs. Conclusions/Significance The methods describe in this paper will be useful for sequencing genomes of individual cells from a variety of samples.
Collapse
|
20
|
Naturally occurring dominant resistance mutations to hepatitis C virus protease and polymerase inhibitors in treatment-naïve patients. Hepatology 2008; 48:1769-78. [PMID: 19026009 PMCID: PMC2645896 DOI: 10.1002/hep.22549] [Citation(s) in RCA: 311] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
UNLABELLED Resistance mutations to hepatitis C virus (HCV) nonstructural protein 3 (NS3) protease inhibitors in <1% of the viral quasispecies may still allow >1000-fold viral load reductions upon treatment, consistent with their reported reduced replicative fitness in vitro. Recently, however, an R155K protease mutation was reported as the dominant quasispecies in a treatment-naïve individual, raising concerns about possible full drug resistance. To investigate the prevalence of dominant resistance mutations against specifically targeted antiviral therapy for HCV (STAT-C) in the population, we analyzed HCV genome sequences from 507 treatment-naïve patients infected with HCV genotype 1 from the United States, Germany, and Switzerland. Phylogenetic sequence analysis and viral load data were used to identify the possible spread of replication-competent, drug-resistant viral strains in the population and to infer the consequences of these mutations upon viral replication in vivo. Mutations described to confer resistance to the protease inhibitors Telaprevir, BILN2061, ITMN-191, SCH6 and Boceprevir; the NS5B polymerase inhibitor AG-021541; and to the NS4A antagonist ACH-806 were observed mostly as sporadic, unrelated cases, at frequencies between 0.3% and 2.8% in the population, including two patients with possible multidrug resistance. Collectively, however, 8.6% of the patients infected with genotype 1a and 1.4% of those infected with genotype 1b carried at least one dominant resistance mutation. Viral loads were high in the majority of these patients, suggesting that drug-resistant viral strains might achieve replication levels comparable to nonresistant viruses in vivo. CONCLUSION Naturally occurring dominant STAT-C resistance mutations are common in treatment-naïve patients infected with HCV genotype 1. Their influence on treatment outcome should further be characterized to evaluate possible benefits of drug resistance testing for individual tailoring of drug combinations when treatment options are limited due to previous nonresponse to peginterferon and ribavirin.
Collapse
|
21
|
|
22
|
|