1
|
Single-cell m 6A mapping in vivo using picoMeRIP-seq. Nat Biotechnol 2024; 42:591-596. [PMID: 37349523 PMCID: PMC10739642 DOI: 10.1038/s41587-023-01831-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 05/17/2023] [Indexed: 06/24/2023]
Abstract
Current N6-methyladenosine (m6A) mapping methods need large amounts of RNA or are limited to cultured cells. Through optimized sample recovery and signal-to-noise ratio, we developed picogram-scale m6A RNA immunoprecipitation and sequencing (picoMeRIP-seq) for studying m6A in vivo in single cells and scarce cell types using standard laboratory equipment. We benchmark m6A mapping on titrations of poly(A) RNA and embryonic stem cells and in single zebrafish zygotes, mouse oocytes and embryos.
Collapse
|
2
|
Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550582. [PMID: 37546854 PMCID: PMC10402094 DOI: 10.1101/2023.07.25.550582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
Collapse
|
3
|
The RNA m 6A landscape of mouse oocytes and preimplantation embryos. Nat Struct Mol Biol 2023; 30:703-709. [PMID: 37081317 DOI: 10.1038/s41594-023-00969-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 03/16/2023] [Indexed: 04/22/2023]
Abstract
Despite the significance of N6-methyladenosine (m6A) in gene regulation, the requirement for large amounts of RNA has hindered m6A profiling in mammalian early embryos. Here we apply low-input methyl RNA immunoprecipitation and sequencing to map m6A in mouse oocytes and preimplantation embryos. We define the landscape of m6A during the maternal-to-zygotic transition, including stage-specifically expressed transcription factors essential for cell fate determination. Both the maternally inherited transcripts to be degraded post fertilization and the zygotically activated genes during zygotic genome activation are widely marked by m6A. In contrast to m6A-marked zygotic ally-activated genes, m6A-marked maternally inherited transcripts have a higher tendency to be targeted by microRNAs. Moreover, RNAs derived from retrotransposons, such as MTA that is maternally expressed and MERVL that is transcriptionally activated at the two-cell stage, are largely marked by m6A. Our results provide a foundation for future studies exploring the regulatory roles of m6A in mammalian early embryonic development.
Collapse
|
4
|
CEDA: integrating gene expression data with CRISPR-pooled screen data identifies essential genes with higher expression. Bioinformatics 2022; 38:5245-5252. [PMID: 36250792 PMCID: PMC9710553 DOI: 10.1093/bioinformatics/btac668] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 09/26/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Clustered regularly interspaced short palindromic repeats (CRISPR)-based genetic perturbation screen is a powerful tool to probe gene function. However, experimental noises, especially for the lowly expressed genes, need to be accounted for to maintain proper control of false positive rate. METHODS We develop a statistical method, named CRISPR screen with Expression Data Analysis (CEDA), to integrate gene expression profiles and CRISPR screen data for identifying essential genes. CEDA stratifies genes based on expression level and adopts a three-component mixture model for the log-fold change of single-guide RNAs (sgRNAs). Empirical Bayesian prior and expectation-maximization algorithm are used for parameter estimation and false discovery rate inference. RESULTS Taking advantage of gene expression data, CEDA identifies essential genes with higher expression. Compared to existing methods, CEDA shows comparable reliability but higher sensitivity in detecting essential genes with moderate sgRNA fold change. Therefore, using the same CRISPR data, CEDA generates an additional hit gene list. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
5
|
|
6
|
Abstract
Motivation Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. Results In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F1-score (0.9354 versus 0.8660). Availability and implementation Sigmap code is accessible at https://github.com/haowenz/sigmap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
7
|
ALS-associated mutation FUS-R521C causes DNA damage and RNA splicing defects. J Clin Invest 2021; 131:149564. [PMID: 33792567 DOI: 10.1172/jci149564] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
8
|
Single-molecule long-read sequencing reveals a conserved intact long RNA profile in sperm. Nat Commun 2021; 12:1361. [PMID: 33649327 PMCID: PMC7921563 DOI: 10.1038/s41467-021-21524-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/22/2021] [Indexed: 01/31/2023] Open
Abstract
Sperm contributes diverse RNAs to the zygote. While sperm small RNAs have been shown to impact offspring phenotypes, our knowledge of the sperm transcriptome, especially the composition of long RNAs, has been limited by the lack of sensitive, high-throughput experimental techniques that can distinguish intact RNAs from fragmented RNAs, known to abound in sperm. Here, we integrate single-molecule long-read sequencing with short-read sequencing to detect sperm intact RNAs (spiRNAs). We identify 3440 spiRNA species in mice and 4100 in humans. The spiRNA profile consists of both mRNAs and long non-coding RNAs, is evolutionarily conserved between mice and humans, and displays an enrichment in mRNAs encoding for ribosome. In sum, we characterize the landscape of intact long RNAs in sperm, paving the way for future studies on their biogenesis and functions. Our experimental and bioinformatics approaches can be applied to other tissues and organisms to detect intact transcripts.
Collapse
|
9
|
Revealing tumor heterogeneity of breast cancer by utilizing the linkage between somatic and germline mutations. Brief Bioinform 2020; 20:2306-2315. [PMID: 30239581 PMCID: PMC6954402 DOI: 10.1093/bib/bby084] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Revised: 06/07/2018] [Accepted: 06/26/2018] [Indexed: 12/25/2022] Open
Abstract
The intra-tumor heterogeneity is associated with cancer progression and therapeutic resistance, such as in breast cancer. While the existing methods for studying tumor heterogeneity only analyze variant allele frequency (VAF), the genotype of variant is also informative for inferring subclones, which can be detected by long reads or paired-end reads. We developed GenoClone to integrate VAF with the genotype of variant innovatively, so it showed superior performance of inferring the number of subclones, estimating the fractions of subclones and identifying somatic single-nucleotide variants composition of subclones. When GenoClone was applied to 389 TCGA breast cancer samples, it revealed extensive intra-tumor heterogeneity. We further found that a few somatic mutations were relevant to the late stage of tumor evolution, including the ones at the oncogene PIK3CA and the tumor suppress gene TP53. Moreover, 52 subclones that were identified from 167 samples shared high similarity of somatic mutations, which were clustered into three groups with the sizes of 24, 14 and 14. It is helpful for understanding the development of breast cancer in certain subgroups of people and the drug development for population level. Furthermore, GenoClone also identified the tumor heterogeneity in different aliquots of the same samples. The implementation of GenoClone is available at http://www.healthcare.uiowa.edu/labs/au/GenoClone/.
Collapse
|
10
|
Abstract
Abstract
Introduction: Genome-wide CRISPR-Cas9 based loss-of-function screens can be used to find essential genes for proliferation and survival of cancer cells. While recent studies have focused on establishing reference sets of essential and non-essential genes, correcting copy number effect and characterizing off-target effect, it lacks in-depth studies of the effects of gene abundance and sgRNAs that targeting multi-genomic loci. To fill this gap timely, we here present a bioinformatics workflow to reduce false positives in CRISPR-Cas9 screens.
Description: Gastric adenocarcinoma cell line AGS was infected with CRISPR knockout library (TKOv3) at a multiplicity of infection of 0.3~0.4. We used the cells right after puromycin selection as the baseline sample, and the cells cultured for 14 days or 20 days as the negative selection samples. The sgRNA inserts were amplified by PCR and the corresponding libraries were sequenced on NextSeq 500 with a single-end 75 bp run, followed by analysis by MAGeCK. The read counts of sgRNAs were normalized by non-essential genes to reduce false positives. The RNA-seq data and copy number data were obtained by CCLE portal. To characterize sgRNAs targeting multiple-genomic loci, Bowtie was used to align sgRNA to the reference human genome (GRCh38) with no mismatch, and only the alignments followed by NGG PAM site were remained for downstream analysis.
Summary: Integration of RNA-seq data with CRISPR negative screen results showed that the selection signal was noisy for the lowly expressed genes. The fraction of selected essential genes (overall FDR<0.05, absolute value of beta score >1) was as low as 0.11% among the genes with the bottom 10% expression level, while 27% among the genes with the top 10% expression level. After filtering out the lowly expressed genes (<0.06 RPKM), the selected essential genes had an FDR much closer to 0. Out of the 40 essential genes selected without filtering out lowly expressed genes, none of them was reported oncogenes in literature. To study the influences of multiple alignments of sgRNAs, we only considered the ones with perfect alignments (i.e., no mismatch) so that we can prevent it from being confounding with off-target effects caused by mismatch tolerance. Log fold changes in read counts were calculated for each sgRNA between a later time point (day 14 or 20) vs. baseline (day 0). The median log fold change significantly decreased as a function of the number of perfect alignments (p = 0.0001, Jonckheere trend test). This supports the hypothesis that a sgRNA aligned to several DNA targets will introduce multiple double stranded cuts, and thus will result in biased essentiality scores.
Conclusions: Filtering out lowly-expressed genes prior to CRISPR screen data analysis can reduce false positives. In addition, multiple-target sgRNAs can lead to false positives but the effect needs further analysis in a case by case manner.
Citation Format: Yue Zhao, Xue Wu, Yuru Wang, Kin Fai Au, Lijun Cheng, Lang Li. New bioinformatics workflow of genome-wide CRISPR-Cas9 knockout screens [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 830.
Collapse
|
11
|
A network-based computational framework to predict and differentiate functions for gene isoforms using exon-level expression data. Methods 2020; 189:54-64. [PMID: 32534132 DOI: 10.1016/j.ymeth.2020.06.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 05/22/2020] [Accepted: 06/06/2020] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Alternative splicing makes significant contributions to functional diversity of transcripts and proteins. Many alternatively spliced gene isoforms have been shown to perform specific biological functions under different contexts. In addition to gene-level expression, the advances of high-throughput sequencing offer a chance to estimate isoform-specific exon expression with a high resolution, which is informative for studying splice variants with network analysis. RESULTS In this study, we propose a novel network-based analysis framework to predict isoform-specific functions from exon-level RNA-Seq data. In particular, based on exon-level expression data, we firstly propose a unified framework, referred to as Iso-Net, to integrate two new mathematical methods (named MINet and RVNet) that infer co-expression networks at different data scenarios. We demonstrate the superior prediction accuracy of Iso-Net over the existing methods for most simulation data, especially in two extreme cases: sample size is very small and exon numbers of two isoforms are quite different. Furthermore, by defining relevant quantitative measures (e.g., Jaccard correlation coefficient) and combining differential co-expression network analysis and GO functional enrichment analysis, a co-expression network analysis framework is developed to predict functions of isoforms and further, to discover their distinct functions within the same gene. We apply Iso-Net to study gene isoforms for several important transcription factors in human myeloid differentiation with the exon-level RNA-Seq data from three different cell lines. AVAILABILITY AND IMPLEMENTATION Iso-Net is open source and freely available from https://github.com/Dingjie-Wang/Iso-Net.
Collapse
|
12
|
Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads. Genome Biol 2020; 21:14. [PMID: 31952552 PMCID: PMC6966875 DOI: 10.1186/s13059-019-1885-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 11/10/2019] [Indexed: 11/10/2022] Open
Abstract
The error-prone third-generation sequencing (TGS) long reads can be corrected by the high-quality second-generation sequencing (SGS) short reads, which is referred to as hybrid error correction. We here investigate the influences of the principal algorithmic factors of two major types of hybrid error correction methods by mathematical modeling and analysis on both simulated and real data. Our study reveals the distribution of accuracy gain with respect to the original long read error rate. We also demonstrate that the original error rate of 19% is the limit for perfect correction, beyond which long reads are too error-prone to be corrected by these methods.
Collapse
|
13
|
Abstract
The most frequently mutated protein in human cancer is p53, a transcription factor (TF) that regulates myriad genes instrumental in diverse cellular outcomes including growth arrest and cell death. Cell context-dependent p53 modulation is critical for this life-or-death balance, yet remains incompletely understood. Here we identify sequence signatures enriched in genomic p53-binding sites modulated by the transcription cofactor iASPP. Moreover, our p53-iASPP crystal structure reveals that iASPP displaces the p53 L1 loop-which mediates sequence-specific interactions with the signature-corresponding base-without perturbing other DNA-recognizing modules of the p53 DNA-binding domain. A TF commonly uses multiple structural modules to recognize its cognate DNA, and thus this mechanism of a cofactor fine-tuning TF-DNA interactions through targeting a particular module is likely widespread. Previously, all tumor suppressors and oncoproteins that associate with the p53 DNA-binding domain-except the oncogenic E6 from human papillomaviruses (HPVs)-structurally cluster at the DNA-binding site of p53, complicating drug design. By contrast, iASPP inhibits p53 through a distinct surface overlapping the E6 footprint, opening prospects for p53-targeting precision medicine to improve cancer therapy.
Collapse
|
14
|
Single-molecule long-read sequencing reveals the chromatin basis of gene expression. Genome Res 2019; 29:1329-1342. [PMID: 31201211 PMCID: PMC6673713 DOI: 10.1101/gr.251116.119] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 06/10/2019] [Indexed: 11/25/2022]
Abstract
Genome-wide chromatin accessibility and nucleosome occupancy profiles have been widely investigated, while the long-range dynamics remain poorly studied at the single-cell level. Here, we present a new experimental approach, methyltransferase treatment followed by single-molecule long-read sequencing (MeSMLR-seq), for long-range mapping of nucleosomes and chromatin accessibility at single DNA molecules and thus achieve comprehensive-coverage characterization of the corresponding heterogeneity. MeSMLR-seq offers direct measurements of both nucleosome-occupied and nucleosome-evicted regions on a single DNA molecule, which is challenging for many existing methods. We applied MeSMLR-seq to haploid yeast, where single DNA molecules represent single cells, and thus we could investigate the combinatorics of many (up to 356) nucleosomes at long range in single cells. We illustrated the differential organization principles of nucleosomes surrounding the transcription start site for silent and actively transcribed genes, at the single-cell level and in the long-range scale. The heterogeneous patterns of chromatin status spanning multiple genes were phased. Together with single-cell RNA-seq data, we quantitatively revealed how chromatin accessibility correlated with gene transcription positively in a highly heterogeneous scenario. Moreover, we quantified the openness of promoters and investigated the coupled chromatin changes of adjacent genes at single DNA molecules during transcription reprogramming. In addition, we revealed the coupled changes of chromatin accessibility for two neighboring glucose transporter genes in response to changes in glucose concentration.
Collapse
|
15
|
A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol 2019; 20:26. [PMID: 30717772 PMCID: PMC6362602 DOI: 10.1186/s13059-018-1605-z] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 12/05/2018] [Indexed: 12/20/2022] Open
Abstract
Background Third-generation sequencing technologies have advanced the progress of the biological research by generating reads that are substantially longer than second-generation sequencing technologies. However, their notorious high error rate impedes straightforward data analysis and limits their application. A handful of error correction methods for these error-prone long reads have been developed to date. The output data quality is very important for downstream analysis, whereas computing resources could limit the utility of some computing-intense tools. There is a lack of standardized assessments for these long-read error-correction methods. Results Here, we present a comparative performance assessment of ten state-of-the-art error-correction methods for long reads. We established a common set of benchmarks for performance assessment, including sensitivity, accuracy, output rate, alignment rate, output read length, run time, and memory usage, as well as the effects of error correction on two downstream applications of long reads: de novo assembly and resolving haplotype sequences. Conclusions Taking into account all of these metrics, we provide a suggestive guideline for method choice based on available data size, computing resources, and individual research goals. Electronic supplementary material The online version of this article (10.1186/s13059-018-1605-z) contains supplementary material, which is available to authorized users.
Collapse
|
16
|
E-C coupling structural protein junctophilin-2 encodes a stress-adaptive transcription regulator. Science 2018; 362:science.aan3303. [PMID: 30409805 DOI: 10.1126/science.aan3303] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2017] [Revised: 05/10/2018] [Accepted: 10/24/2018] [Indexed: 11/02/2022]
Abstract
Junctophilin-2 (JP2) is a structural protein required for normal excitation-contraction (E-C) coupling. After cardiac stress, JP2 is cleaved by the calcium ion-dependent protease calpain, which disrupts the E-C coupling ultrastructural machinery and drives heart failure progression. We found that stress-induced proteolysis of JP2 liberates an N-terminal fragment (JP2NT) that translocates to the nucleus, binds to genomic DNA, and controls expression of a spectrum of genes in cardiomyocytes. Transgenic overexpression of JP2NT in mice modifies the transcriptional profile, resulting in attenuated pathological remodeling in response to cardiac stress. Conversely, loss of nuclear JP2NT function accelerates stress-induced development of hypertrophy and heart failure in mutant mice. These data reveal a self-protective mechanism in failing cardiomyocytes that transduce mechanical information (E-C uncoupling) into salutary transcriptional reprogramming in the stressed heart.
Collapse
|
17
|
A Statistical Method for Observing Personal Diploid Methylomes and Transcriptomes with Single-Molecule Real-Time Sequencing. Genes (Basel) 2018; 9:E460. [PMID: 30235838 PMCID: PMC6162384 DOI: 10.3390/genes9090460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/12/2018] [Accepted: 09/12/2018] [Indexed: 11/16/2022] Open
Abstract
We address the problem of observing personal diploid methylomes, CpG methylome pairs of homologous chromosomes that are distinguishable with respect to phased heterozygous variants (PHVs), which is challenging due to scarcity of PHVs in personal genomes. Single molecule real-time (SMRT) sequencing is promising as it outputs long reads with CpG methylation information, but a serious concern is whether reliable PHVs are available in erroneous SMRT reads with an error rate of ∼15%. To overcome the issue, we propose a statistical model that reduces the error rate of phasing CpG site to 1%, thereby calling CpG hypomethylation in each haplotype with >90% precision and sensitivity. Using our statistical model, we examined GNAS complex locus known for a combination of maternally, paternally, or biallelically expressed isoforms, and observed allele-specific methylation pattern almost perfectly reflecting their respective allele-specific expression status, demonstrating the merit of elucidating comprehensive personal diploid methylomes and transcriptomes.
Collapse
|
18
|
Delayed diagnosis of tuberculosis: risk factors and effect on mortality among older adults in Hong Kong. Hong Kong Med J 2018; 24:361-368. [PMID: 30065120 DOI: 10.12809/hkmj177081] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
OBJECTIVE To assess the risk factors and effects of delayed diagnosis on tuberculosis (TB) mortality in Hong Kong. METHODS All consecutive patients with TB notified in 2010 were tracked through their clinical records for treatment outcome until 2012. All TB cases notified or confirmed after death were identified for a mortality survey on the timing and causes of death. RESULTS Of 5092 TB cases notified, 1061 (20.9%) died within 2 years of notification; 211 (4.1%) patients died before notification, 683 (13.4%) died within the first year, and 167 (3.3%) died within the second year after notification. Among the 211 cases with TB notified after death, only 30 were certified to have died from TB. However, 52 (24.6%) died from unspecified pneumonia/sepsis possibly related to pulmonary TB. If these cases are counted, the total TB-related deaths increases from 191 to 243. In 82 (33.7%) of these, TB was notified after death. Over 60% of cases in which TB diagnosed after death involved patients aged ≥80 years and a similar proportion had an advance care directive against resuscitation or investigation. Independent factors for TB notified after death included female sex, living in an old age home, drug abuse, malignancy other than lung cancer, sputum TB smear negative, sputum TB culture positive, and chest X-ray not done. CONCLUSIONS High mortality was observed among patients with TB aged ≥80 years. Increased vigilance is warranted to avoid delayed diagnosis and reduce the transmission risk, especially among elderly patients with co-morbidities living in old age homes.
Collapse
|
19
|
IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing. Bioinformatics 2018; 34:2168-2176. [PMID: 29905763 PMCID: PMC6022631 DOI: 10.1093/bioinformatics/bty098] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 02/10/2018] [Accepted: 02/21/2018] [Indexed: 12/24/2022] Open
Abstract
Motivation In the past years, the long read (LR) sequencing technologies, such as Pacific Biosciences and Oxford Nanopore Technologies, have been demonstrated to substantially improve the quality of genome assembly and transcriptome characterization. Compared to the high cost of genome assembly by LR sequencing, it is more affordable to generate LRs for transcriptome characterization. That is, when informative transcriptome LR data are available without a high-quality genome, a method for de novo transcriptome assembly and annotation is of high demand. Results Without a reference genome, IDP-denovo performs de novo transcriptome assembly, isoform annotation and quantification by integrating the strengths of LRs and short reads. Using the GM12878 human data as a gold standard, we demonstrated that IDP-denovo had superior sensitivity of transcript assembly and high accuracy of isoform annotation. In addition, IDP-denovo outputs two abundance indices to provide a comprehensive expression profile of genes/isoforms. IDP-denovo represents a robust approach for transcriptome assembly, isoform annotation and quantification for non-model organism studies. Applying IDP-denovo to a non-model organism, Dendrobium officinale, we discovered a number of novel genes and novel isoforms that were not reported by the existing annotation library. These results reveal the high diversity of gene isoforms in D.officinale, which was not reported in the existing annotation library. Availability and implementation The dataset of Dendrobium officinale used/analyzed during the current study has been deposited in SRA, with accession code SRP094520. IDP-denovo is available for download at www.healthcare.uiowa.edu/labs/au/IDP-denovo/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
20
|
Single cell expression analysis of primate-specific retroviruses-derived HPAT lincRNAs in viable human blastocysts identifies embryonic cells co-expressing genetic markers of multiple lineages. Heliyon 2018; 4:e00667. [PMID: 30003161 PMCID: PMC6039856 DOI: 10.1016/j.heliyon.2018.e00667] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 02/01/2018] [Accepted: 06/21/2018] [Indexed: 12/03/2022] Open
Abstract
Chromosome instability and aneuploidies occur very frequently in human embryos, impairing proper embryogenesis and leading to cell cycle arrest, loss of cell viability, and developmental failures in 50–80% of cleavage-stage embryos. This high frequency of cellular extinction events represents a significant experimental obstacle challenging analyses of individual cells isolated from human preimplantation embryos. We carried out single cell expression profiling of 241 individual cells recovered from 32 human embryos during the early and late stages of viable human blastocyst (VHB) differentiation. Classification of embryonic cells was performed solely based on expression patterns of human pluripotency-associated transcripts (HPAT), which represent a family of primate-specific transposable element-derived lincRNAs highly expressed in human embryonic stem cells and regulating nuclear reprogramming and pluripotency induction. We then validated our findings by analyzing transcriptomes of 1,708 individual cells recovered from more than 100 human embryos and 259 mouse cells from more than 40 mouse embryos at different stages of preimplantation embryogenesis. HPAT's expression-guided spatiotemporal reconstruction of human embryonic development inferred from single-cell expression analysis of VHB differentiation enabled identification of telomerase-positive embryonic cells co-expressing key pluripotency regulatory genes and genetic markers of three major lineages. Follow-up validation analyses confirmed the emergence in human embryos prior to lineage segregation of telomerase-positive cells co-expressing genetic markers of multiple lineages. Observations reported in this contribution support the hypothesis of a developmental pathway of creation embryonic lineages and extraembryonic tissues from telomerase-positive pre-lineage cells manifesting multi-lineage precursor phenotype.
Collapse
|
21
|
Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale. Genes (Basel) 2017; 8:genes8100257. [PMID: 28981454 PMCID: PMC5664107 DOI: 10.3390/genes8100257] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 09/16/2017] [Accepted: 10/02/2017] [Indexed: 11/16/2022] Open
Abstract
Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET) and one sucrose transporter (SUT) are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT) and four cellulose synthase (Ces) genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF) genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.
Collapse
|
22
|
CF airway smooth muscle transcriptome reveals a role for PYK2. JCI Insight 2017; 2:95332. [PMID: 28878137 DOI: 10.1172/jci.insight.95332] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 07/27/2017] [Indexed: 12/17/2022] Open
Abstract
Abnormal airway smooth muscle function can contribute to cystic fibrosis (CF) airway disease. We previously found that airway smooth muscle from newborn CF pigs had increased basal tone, an increased bronchodilator response, and abnormal calcium handling. Since CF pigs lack airway infection and inflammation at birth, these findings suggest intrinsic airway smooth muscle dysfunction in CF. In this study, we tested the hypothesis that CFTR loss in airway smooth muscle would produce a distinct set of changes in the airway smooth muscle transcriptome that we could use to develop novel therapeutic targets. Total RNA sequencing of newborn wild-type and CF airway smooth muscle revealed changes in muscle contraction-related genes, ontologies, and pathways. Using connectivity mapping, we identified several small molecules that elicit transcriptional signatures opposite of CF airway smooth muscle, including NVP-TAE684, an inhibitor of proline-rich tyrosine kinase 2 (PYK2). In CF airway smooth muscle tissue, PYK2 phosphorylation was increased and PYK2 inhibition decreased smooth muscle contraction. In vivo NVP-TAE684 treatment of wild-type mice reduced methacholine-induced airway smooth muscle contraction. These findings suggest that studies in the newborn CF pig may provide an important approach to enhance our understanding of airway smooth muscle biology and for discovery of novel airway smooth muscle therapeutics for CF and other diseases of airway hyperreactivity.
Collapse
|
23
|
Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis. Nat Commun 2017; 8:59. [PMID: 28680106 PMCID: PMC5498581 DOI: 10.1038/s41467-017-00050-4] [Citation(s) in RCA: 165] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 05/02/2017] [Indexed: 12/30/2022] Open
Abstract
RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome. RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.
Collapse
|
24
|
Discovery of novel determinants of endothelial lineage using chimeric heterokaryons. eLife 2017; 6. [PMID: 28323620 PMCID: PMC5391207 DOI: 10.7554/elife.23588] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 03/17/2017] [Indexed: 12/29/2022] Open
Abstract
We wish to identify determinants of endothelial lineage. Murine embryonic stem cells (mESC) were fused with human endothelial cells in stable, non-dividing, heterokaryons. Using RNA-seq, it is possible to discriminate between human and mouse transcripts in these chimeric heterokaryons. We observed a temporal pattern of gene expression in the ESCs of the heterokaryons that recapitulated ontogeny, with early mesodermal factors being expressed before mature endothelial genes. A set of transcriptional factors not known to be involved in endothelial development was upregulated, one of which was POU class 3 homeobox 2 (Pou3f2). We confirmed its importance in differentiation to endothelial lineage via loss- and gain-of-function (LOF and GOF). Its role in vascular development was validated in zebrafish embryos using morpholino oligonucleotides. These studies provide a systematic and mechanistic approach for identifying key regulators in directed differentiation of pluripotent stem cells to somatic cell lineages. DOI:http://dx.doi.org/10.7554/eLife.23588.001 Endothelial cells form the inner surface of blood vessels, acting like a non-stick coating. In addition to making substances that keep blood from sticking to the vessel wall, endothelial cells generate compounds that relax the vessel, and prevent it from thickening. Endothelial cells also form capillaries, the smallest vessels that provide oxygen and nutrients for all tissues. A regenerating organ, or a bioengineered tissue, requires a system of capillaries and other microvessels. Thus, regenerative medicine could benefit from a knowledge of how to generate endothelial cells from pluripotent stem cells – cells that can “differentiate” to form almost any type of cell in the body. Wong, Matrone et al. have now used a cell fusion model (named heterokaryon) to track the changes in gene expression that occur as a pluripotent stem cell differentiates to ultimately become an endothelial cell. In this model, mouse embryonic stem cells (ESCs) are fused to human endothelial cells. Over time the human endothelial cells drive gene expression in the ESCs toward that of endothelial cells. Wong, Matrone et al. discovered changes in gene expression in many genes that have not previously been described as involved in the differentiation of endothelial cells. When one of these genes – named Pou3f2 – was inactivated in ESCs, they could not be differentiated into endothelial cells. The absence of Pou3f2 also drastically impaired how blood vessels developed in zebrafish embryos. Thus the heterokaryon model can generate important information regarding the dynamic changes in gene expression that occur as a pluripotent cell differentiates to become an endothelial cell. This model may also be useful for discovering other genes that control the differentiation of other cell types. DOI:http://dx.doi.org/10.7554/eLife.23588.002
Collapse
|
25
|
IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res 2017; 45:e32. [PMID: 27899656 PMCID: PMC5952581 DOI: 10.1093/nar/gkw1076] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 10/20/2016] [Accepted: 10/26/2016] [Indexed: 12/14/2022] Open
Abstract
Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.
Collapse
|
26
|
Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 2017; 6:100. [PMID: 28868132 PMCID: PMC5553090 DOI: 10.12688/f1000research.10571.1] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/09/2017] [Indexed: 09/05/2023] Open
Abstract
Background: Given the demonstrated utility of Third Generation Sequencing [Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)] long reads in many studies, a comprehensive analysis and comparison of their data quality and applications is in high demand. Methods: Based on the transcriptome sequencing data from human embryonic stem cells, we analyzed multiple data features of PacBio and ONT, including error pattern, length, mappability and technical improvements over previous platforms. We also evaluated their application to transcriptome analyses, such as isoform identification and quantification and characterization of transcriptome complexity, by comparing the performance of size-selected PacBio, non-size-selected ONT and their corresponding Hybrid-Seq strategies (PacBio+Illumina and ONT+Illumina). Results: PacBio shows overall better data quality, while ONT provides a higher yield. As with data quality, PacBio performs marginally better than ONT in most aspects for both long reads only and Hybrid-Seq strategies in transcriptome analysis. In addition, Hybrid-Seq shows superior performance over long reads only in most transcriptome analyses. Conclusions: Both PacBio and ONT sequencing are suitable for full-length single-molecule transcriptome analysis. As this first use of ONT reads in a Hybrid-Seq analysis has shown, both PacBio and ONT can benefit from a combined Illumina strategy. The tools and analytical methods developed here provide a resource for future applications and evaluations of these rapidly-changing technologies.
Collapse
|
27
|
Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 2017; 6:100. [PMID: 28868132 PMCID: PMC5553090 DOI: 10.12688/f1000research.10571.2] [Citation(s) in RCA: 234] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/09/2017] [Indexed: 12/11/2022] Open
Abstract
Background: Given the demonstrated utility of Third Generation Sequencing [Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)] long reads in many studies, a comprehensive analysis and comparison of their data quality and applications is in high demand.
Methods: Based on the transcriptome sequencing data from human embryonic stem cells, we analyzed multiple data features of PacBio and ONT, including error pattern, length, mappability and technical improvements over previous platforms. We also evaluated their application to transcriptome analyses, such as isoform identification and quantification and characterization of transcriptome complexity, by comparing the performance of size-selected PacBio, non-size-selected ONT and their corresponding Hybrid-Seq strategies (PacBio+Illumina and ONT+Illumina).
Results: PacBio shows overall better data quality, while ONT provides a higher yield. As with data quality, PacBio performs marginally better than ONT in most aspects for both long reads only and Hybrid-Seq strategies in transcriptome analysis. In addition, Hybrid-Seq shows superior performance over long reads only in most transcriptome analyses.
Conclusions: Both PacBio and ONT sequencing are suitable for full-length single-molecule transcriptome analysis. As this first use of ONT reads in a Hybrid-Seq analysis has shown, both PacBio and ONT can benefit from a combined Illumina strategy. The tools and analytical methods developed here provide a resource for future applications and evaluations of these rapidly-changing technologies.
Collapse
|
28
|
Abstract 5289: Enhance both precision and sensitivity of fusion gene detection by hybrid sequencing. Cancer Res 2016. [DOI: 10.1158/1538-7445.am2016-5289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: New Third Generation Sequencing (TGS) techniques, such as PacBio, can provide very informative insights into the transcriptome, such as expression of fusion genes/fusion transcripts from cancer samples. However, the currently available fusion genes analysis tools are for Second Generation Sequencing (SGS) data, where the short read length and unreliable alignments can lead to uncertain accuracy of fusion gene detections. Hybrid-Seq, which integrates SGS short read data into the analysis of TGS long read data, can complement the strengths of both and thus improve the overall performance and resolution of the output data. It also reduces the required amount of TGS data and thus the sequencing cost. Recently, we developed and reported on a Hybrid-Seq approach, IDP-fusion and the results of the proof-of-concept application to MCF-7 data. We demonstrated that IDP-fusion can identify fusions genes with much higher precision than SGS-based approaches. Although the sensitivity is comparable to the most sensitive SGS-only method, a significant proportion of experimentally verified gold standard fusion genes had yet to be identified by IDP-fusion. It indicated an opportunity to enhance the sensitivity of IDP-fusion while retaining the unparalleled precision of the results.
Method: Here we present an innovative Hybrid-Seq approach which extends IDP-fusion, to filtered fusion gene candidates predicted by SGS short read alignments. The fusion gene candidates are verified by the presence of a TGS long read better aligned to an artificial chromosome created from the fusion candidate than any single genome locus. We applied IDP-fusion to a Hybrid-Seq data from the MCF-7 breast cancer cell line, including Illumina SGS data and a lately TGS data generated by PacBio P5-C3 sequencing chemistry. The new IDP-fusion considered fusion candidates reported by several popular SGS tools (TopHat-Fusion, SOAPfuse, TRUP, FusionMap, and deFuse). We compared performance of our new tool to the original IDP-fusion, and the SGS-only approaches.
Results: The new algorithm of IDP-fusion improved the sensitivity from 33.8% to 54.9%. This is higher than the most sensitive SGS-only tool (deFuse, 38.0%), which is achieved at the cost of a low precision of 13.8%. The improved IDP-fusion retains a precision of 60.9%, which is only down slightly from the original IDP-fusion at 68.6%. This tradeoff is acceptable when considering the overall accuracy described by F-score for IDP-fusion. The F-score has increased from 45.3% to 57.8%, which is also considerably better than the best F-score achieved by SGS-only methods (32.8%).
Conclusions: Fusion candidates identified directly from SGS reads can be screened using alignments of TGS long reads, and supplement fusion candidates detected from long reads. Comparing to SGS-only methods, this Hybrid-Seq approach provides much more sensitive and more accurate reports on the fusion genes.
Citation Format: Jason L. Weirather, Tyson A. Clark, Elizabeth Tseng, Jonas Korlach, Kin Fai Au. Enhance both precision and sensitivity of fusion gene detection by hybrid sequencing. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 5289.
Collapse
|
29
|
The primate-specific noncoding RNA HPAT5 regulates pluripotency during human preimplantation development and nuclear reprogramming. Nat Genet 2015; 48:44-52. [PMID: 26595768 DOI: 10.1038/ng.3449] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 10/22/2015] [Indexed: 12/14/2022]
Abstract
Long intergenic noncoding RNAs (lincRNAs) are derived from thousands of loci in mammalian genomes and are frequently enriched in transposable elements (TEs). Although families of TE-derived lincRNAs have recently been implicated in the regulation of pluripotency, little is known of the specific functions of individual family members. Here we characterize three new individual TE-derived human lincRNAs, human pluripotency-associated transcripts 2, 3 and 5 (HPAT2, HPAT3 and HPAT5). Loss-of-function experiments indicate that HPAT2, HPAT3 and HPAT5 function in preimplantation embryo development to modulate the acquisition of pluripotency and the formation of the inner cell mass. CRISPR-mediated disruption of the genes for these lincRNAs in pluripotent stem cells, followed by whole-transcriptome analysis, identifies HPAT5 as a key component of the pluripotency network. Protein binding and reporter-based assays further demonstrate that HPAT5 interacts with the let-7 microRNA family. Our results indicate that unique individual members of large primate-specific lincRNA families modulate gene expression during development and differentiation to reinforce cell fate.
Collapse
|
30
|
PacBio Sequencing and Its Applications. GENOMICS PROTEOMICS & BIOINFORMATICS 2015; 13:278-89. [PMID: 26542840 PMCID: PMC4678779 DOI: 10.1016/j.gpb.2015.08.002] [Citation(s) in RCA: 1130] [Impact Index Per Article: 125.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/06/2015] [Accepted: 08/11/2015] [Indexed: 12/15/2022]
Abstract
Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.
Collapse
|
31
|
Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Res 2015; 43:e116. [PMID: 26040699 PMCID: PMC4605286 DOI: 10.1093/nar/gkv562] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 05/15/2015] [Indexed: 12/19/2022] Open
Abstract
We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.
Collapse
|
32
|
Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2015; 82:951-961. [PMID: 25912611 DOI: 10.1111/tpj.12865] [Citation(s) in RCA: 217] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 04/19/2015] [Accepted: 04/21/2015] [Indexed: 05/20/2023]
Abstract
Danshen, Salvia miltiorrhiza Bunge, is one of the most widely used herbs in traditional Chinese medicine, wherein its rhizome/roots are particularly valued. The corresponding bioactive components include the tanshinone diterpenoids, the biosynthesis of which is a subject of considerable interest. Previous investigations of the S. miltiorrhiza transcriptome have relied on short-read next-generation sequencing (NGS) technology, and the vast majority of the resulting isotigs do not represent full-length cDNA sequences. Moreover, these efforts have been targeted at either whole plants or hairy root cultures. Here, we demonstrate that the tanshinone pigments are produced and accumulate in the root periderm, and apply a combination of NGS and single-molecule real-time (SMRT) sequencing to various root tissues, particularly including the periderm, to provide a more complete view of the S. miltiorrhiza transcriptome, with further insight into tanshinone biosynthesis as well. In addition, the use of SMRT long-read sequencing offered the ability to examine alternative splicing, which was found to occur in approximately 40% of the detected gene loci, including several involved in isoprenoid/terpenoid metabolism.
Collapse
|
33
|
The transcriptome of human pluripotent stem cells. Curr Opin Genet Dev 2014; 28:71-7. [DOI: 10.1016/j.gde.2014.09.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Revised: 09/29/2014] [Accepted: 09/30/2014] [Indexed: 12/11/2022]
|
34
|
ALS-associated mutation FUS-R521C causes DNA damage and RNA splicing defects. J Clin Invest 2014; 124:981-99. [PMID: 24509083 DOI: 10.1172/jci72723] [Citation(s) in RCA: 194] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 11/27/2013] [Indexed: 12/20/2022] Open
Abstract
Autosomal dominant mutations of the RNA/DNA binding protein FUS are linked to familial amyotrophic lateral sclerosis (FALS); however, it is not clear how FUS mutations cause neurodegeneration. Using transgenic mice expressing a common FALS-associated FUS mutation (FUS-R521C mice), we found that mutant FUS proteins formed a stable complex with WT FUS proteins and interfered with the normal interactions between FUS and histone deacetylase 1 (HDAC1). Consequently, FUS-R521C mice exhibited evidence of DNA damage as well as profound dendritic and synaptic phenotypes in brain and spinal cord. To provide insights into these defects, we screened neural genes for nucleotide oxidation and identified brain-derived neurotrophic factor (Bdnf) as a target of FUS-R521C-associated DNA damage and RNA splicing defects in mice. Compared with WT FUS, mutant FUS-R521C proteins formed a more stable complex with Bdnf RNA in electrophoretic mobility shift assays. Stabilization of the FUS/Bdnf RNA complex contributed to Bdnf splicing defects and impaired BDNF signaling through receptor TrkB. Exogenous BDNF only partially restored dendrite phenotype in FUS-R521C neurons, suggesting that BDNF-independent mechanisms may contribute to the defects in these neurons. Indeed, RNA-seq analyses of FUS-R521C spinal cords revealed additional transcription and splicing defects in genes that regulate dendritic growth and synaptic functions. Together, our results provide insight into how gain-of-function FUS mutations affect critical neuronal functions.
Collapse
|
35
|
Activation of innate immunity is required for efficient nuclear reprogramming. Cell 2013; 151:547-58. [PMID: 23101625 DOI: 10.1016/j.cell.2012.09.034] [Citation(s) in RCA: 275] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2011] [Revised: 07/05/2012] [Accepted: 09/18/2012] [Indexed: 12/19/2022]
Abstract
Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripotent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved exceedingly inefficient. We discovered a striking difference in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogramming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain- and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables efficient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modifiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory pathways are required for efficient nuclear reprogramming in the induction of pluripotency.
Collapse
|
36
|
An Oct4-Sall4-Nanog network controls developmental progression in the pre-implantation mouse embryo. Mol Syst Biol 2013; 9:632. [PMID: 23295861 PMCID: PMC3564263 DOI: 10.1038/msb.2012.65] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 11/30/2012] [Indexed: 01/18/2023] Open
Abstract
Landmark events occur in a coordinated manner during pre-implantation development of the mammalian embryo, yet the regulatory network that orchestrates these events remains largely unknown. Here, we present the first systematic investigation of the network in pre-implantation mouse embryos using morpholino-mediated gene knockdowns of key embryonic stem cell (ESC) factors followed by detailed transcriptome analysis of pooled embryos, single embryos, and individual blastomeres. We delineated the regulons of Oct4, Sall4, and Nanog and identified a set of metabolism- and transport-related genes that were controlled by these transcription factors in embryos but not in ESCs. Strikingly, the knockdown embryos arrested at a range of developmental stages. We provided evidence that the DNA methyltransferase Dnmt3b has a role in determining the extent to which a knockdown embryo can develop. We further showed that the feed-forward loop comprising Dnmt3b, the pluripotency factors, and the miR-290-295 cluster exemplifies a network motif that buffers embryos against gene expression noise. Our findings indicate that Oct4, Sall4, and Nanog form a robust and integrated network to govern mammalian pre-implantation development.
Collapse
|
37
|
Abstract
The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.
Collapse
|
38
|
RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development. Genome Res 2012; 23:201-16. [PMID: 22960373 PMCID: PMC3530680 DOI: 10.1101/gr.141424.112] [Citation(s) in RCA: 121] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking. Here, we used paired end RNA sequencing (RNA-seq) to explore the transcriptome of Xenopus tropicalis in 23 distinct developmental stages. We determined expression levels of all genes annotated in RefSeq and Ensembl and showed for the first time on a genome-wide scale that, despite a general state of transcriptional silence in the earliest stages of development, approximately 150 genes are transcribed prior to the midblastula transition. In addition, our splicing analysis uncovered more than 10,000 novel splice junctions at each stage and revealed that many known genes have additional unannotated isoforms. Furthermore, we used Cufflinks to reconstruct transcripts from our RNA-seq data and found that ∼13.5% of the final contigs are derived from novel transcribed regions, both within introns and in intergenic regions. We then developed a filtering pipeline to separate protein-coding transcripts from noncoding RNAs and identified a confident set of 6686 noncoding transcripts in 3859 genomic loci. Since the current reference genome, XenTro3, consists of hundreds of scaffolds instead of full chromosomes, we also performed de novo reconstruction of the transcriptome using Trinity and uncovered hundreds of transcripts that are missing from the genome. Collectively, our data will not only aid in completing the assembly of the Xenopus tropicalis genome but will also serve as a valuable resource for gene discovery and for unraveling the fundamental mechanisms of vertebrate embryogenesis.
Collapse
|
39
|
Abstract 363: Toll-Like Receptor 3 Activation Promotes Efficient Nuclear Reprogramming and Endothelial Differentiation. Circ Res 2012. [DOI: 10.1161/res.111.suppl_1.a363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Introduction:
Stem cell therapy for vascular regeneration has been investigated using embryonic stem cells. We recently generated endothelial cells (ECs) from human induced pluripotent stem cells (hiPSCs) and investigated their potential to promote the perfusion of ischemic tissue in a murine model of peripheral arterial disease (PAD). However, to utilize iPSCs therapeutically, the cells should be generated via non-integrating approaches to avoid integration of foreign DNA into the genome.
Objective:
The present study highlights underlying mechanisms of reprogramming and investigates the role of novel pathways in enhancing nuclear reprogramming for potential clinical application.
Results:
Since the initial discovery, different non-integrating approaches have been developed to generate iPSCs. One such approach is to deliver the pluripotent factors (Oct4, Sox2, Klf4 and cMyc) as cell-permeant proteins (CPPs). However, human cells have not been reprogrammed using purified CPPs. In seeking to develop this approach, we discovered a striking difference in the pattern of gene expression induced by viral versus protein-based delivery of the reprogramming factors. This suggested that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In both gain- and loss-of function studies, we find that activation of toll-like receptor 3 (TLR3) plays a role in the efficient reprogramming of human cells using viral approaches. Stimulation of TLR3 causes rapid changes in the expression of epigenetic modifiers, with chromatin remodeling and changes in gene expression that favors induction of pluripotency. Importantly, knowing that this pathway is critical, we were able to generate human iPSCs using CPPs by adding a TLR3 agonist (Poly IC) to the reprogramming protocol.
Conclusion:
Recognition of the role of innate immunity signaling in reprogramming may advance the therapeutic application of iPSCs. We intend to develop an efficient protein-based system to generate EC and determine their therapeutic potential in animal models of PAD. Furthermore, we have discovered an important signaling pathway in reprogramming, which may have implications in cancer biology and regenerative medicine.
Collapse
|
40
|
Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res 2010; 38:4570-8. [PMID: 20371516 PMCID: PMC2919714 DOI: 10.1093/nar/gkq211] [Citation(s) in RCA: 209] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 03/10/2010] [Accepted: 03/12/2010] [Indexed: 11/27/2022] Open
Abstract
Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50-100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12% higher sensitivity without sacrificing specificity.
Collapse
|
41
|
TB surveillance in correctional institutions in Hong Kong, 1999-2005. Int J Tuberc Lung Dis 2008; 12:93-98. [PMID: 18173884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023] Open
Abstract
OBJECTIVE To understand the epidemiology of tuberculosis (TB) inside the prison system of Hong Kong. METHOD Prospective territory-wide TB surveillance was conducted among prisoners in 24 correctional institutions. RESULTS From 1999 to 2005, 622 prevalent TB cases diagnosed before or within 3 months of incarceration and 214 incident cases diagnosed after 3 months were reported by prison staff to a paper-based central prison TB registry. Both crude prevalence and incidence were falling (chi(2) for trend, both P < 0.001), despite a higher sex- and age-adjusted prison TB incidence as compared to the general population (indirectly standardised rate [ISR] 280.6 vs. 108.0/100000, P < 0.001). Illegal immigrants (odds ratio [OR] 3.6, 95% confidence interval [CI] 1.8-7.4) and drug addicts (OR 2.04, 95%CI 1.13-3.7) were two major risk groups. The TB incident risk disappeared after their exclusion (ISR 117.1 vs. 108.0/100000, P = 0.52). No significant difference in the multidrug-resistant rate was found when comparing the group with the general population (3.5% vs. 1.0%, OR 3.6, 95%CI 0.5-28.4). No extensively drug-resistant (XDR) cases were identified. CONCLUSION TB remains a significant disease in local prisons. Further strengthening of TB control programmes in prisons, especially targeting the higher risk groups, is recommended.
Collapse
|
42
|
Chest radiograph screening for tuberculosis in a Hong Kong prison. Int J Tuberc Lung Dis 2005; 9:627-32. [PMID: 15971389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023] Open
Abstract
SETTING Long-stay prisoners are not regularly screened for TB in Hong Kong. OBJECTIVE To evaluate tuberculosis (TB) screening in prison. METHOD All prisoners in a maximum security prison as of 31 October 2001 were screened by chest radiograph (CXR), except for those being followed up for TB or examined by CXR in the last 6 months. RESULTS A total of 814 male prisoners aged 34.6 +/- 9.6 (mean +/- SD) years were successfully screened. Of 53 cases (6.51%) with radiographic abnormalities, 10 active TB cases (8 culture-negative, 2 culture-positive) were diagnosed, giving an overall yield of 1.23% (95%CI 0.59-2.26). There was no statistical difference in age, ethnicity, place of birth or residency status between those with and those without TB (all P > 0.05). Incarceration > or = 2 years, being in current prison > or = 2 years and not having CXR in last 2 years were associated with TB in univariate analysis (all P < 0.05), but only the last remained an independent predictor in multiple logistic regression (OR 16.8, 95%CI 2.1-132.9, P = 0.008). In that group, the yield was 3.1% (95%CI 1.42-5.89). No further cases were detected in the subsequent 2 years. CONCLUSION CXR screening of long-stay prisoners gave a high yield in this study.
Collapse
|
43
|
Tuberculin response in BCG vaccinated schoolchildren and the estimation of annual risk of infection in Hong Kong. Thorax 2005; 60:124-9. [PMID: 15681500 PMCID: PMC1747293 DOI: 10.1136/thx.2003.017970] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
BACKGROUND In Hong Kong there has been nearly universal neonatal BCG vaccination coverage since 1980. METHOD 21 113 schoolchildren aged 6-9 years were skin tested with one unit of tuberculin (PPD RT-23) using the intradermal technique during a routine BCG revaccination programme. Information on sex, date of birth, date of tuberculin testing, and tuberculin reaction size at 72 hours was retrieved. The annual risk of tuberculous infection (ARTI) was estimated by three different approaches. RESULTS Significantly higher tuberculin positive rates were found in girls and with increasing age at all commonly used cut-off points (5, 10, and 15 mm). Using a cut-off point of > or =10 mm and the formula 1- (1 - tuberculin positive rate)(1/age), the ARTI was estimated to be 1.93% (95% CI 1.84 to 2.03) for girls and 1.41% (95% CI 1.33 to 1.50) for boys. Using the differences in the tuberculin positive rate between the 6-7 year and 8-9 year age groups, the ARTI became 1.90% (95% CI 1.09 to 2.70) and 1.84% (95% CI 1.15 to 2.54) for girls and boys, respectively. When the prevalence of infection was estimated by locating a secondary peak of the tuberculin reaction distribution curve at 15 mm and assuming a symmetrical distribution of reaction sizes among those infected around this peak, the corresponding ARTI was much lower at 0.52% (95% CI 0.46 to 0.59) and 0.43% (95% CI 0.37 to 0.49) for girls and boys, similar to that estimated indirectly from the prevalence of disease. CONCLUSION The ARTI as estimated by conventional methods was unexpectedly high among BCG vaccinated children and did not agree with that anticipated from the annual incidence of active disease. Further studies are needed to address the discrepancies, including the possible interaction between BCG and other environmental stimuli.
Collapse
|
44
|
Socio-economic factors and tuberculosis: a district-based ecological analysis in Hong Kong. Int J Tuberc Lung Dis 2004; 8:958-64. [PMID: 15305477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023] Open
Abstract
BACKGROUND Relatively little is known about the impact of socio-economic factors on tuberculosis in a metropolitan city with high disease incidence. METHOD District-specific tuberculosis notification rates for 1995--1997 and 2000--2002 were indirectly sex- and age-adjusted and compared with the socio-economic characteristics in the 1996 by-census and 2001 census. RESULTS The differences between the 18 districts persisted after 3-year averaging and indirect standardisation. Only the percentage of population born locally, the percentage of the population widowed or divorced and the percentage of households residing in rooms or bedsits were consistently associated with the standardised notification ratios (SNR) for both periods, the first being negatively so (all P < 0.05). In a combined analysis with a general linear model for both periods, birth in China, residence <7 years, speaking other Asian languages, being married and in a single household were also significantly associated with the SNR (all P < 0.05). Using a backward conditional approach, only local birth, being married, and residing in rooms or bedsits were independent predictors of SNR (all P < 0.05). There was no significant association between SNR and socio-economic indices on education, occupation, unemployment and income. CONCLUSION Socio-economic factors other than simple poverty are affecting the district-specific tuberculosis rates in Hong Kong.
Collapse
|
45
|
Levofloxacin in the treatment of drug-resistant tuberculosis. Int J Tuberc Lung Dis 1997; 1:89. [PMID: 9441068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
|