Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

45
(from Reference Citation Analysis)

Article PDFs (22)

Cited by > 0 (35)

Searched Name

Kin Fai Au

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Single-cell m⁶A mapping in vivo using picoMeRIP-seq. Nat Biotechnol 2024;42:591-596. [PMID: 37349523 PMCID: PMC10739642 DOI: 10.1038/s41587-023-01831-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 05/17/2023] [Indexed: 06/24/2023] Abstract Current N6-methyladenosine (m6A) mapping methods need large amounts of RNA or are limited to cultured cells. Through optimized sample recovery and signal-to-noise ratio, we developed picogram-scale m6A RNA immunoprecipitation and sequencing (picoMeRIP-seq) for studying m6A in vivo in single cells and scarce cell types using standard laboratory equipment. We benchmark m6A mapping on titrations of poly(A) RNA and embryonic stem cells and in single zebrafish zygotes, mouse oocytes and embryos. Collapse Key Words methylation analysis embryonic induction epigenomics transcriptomics epigenetic memory Collapse MESH Headings Animals Mice Zebrafish/genetics Zebrafish/metabolism RNA/genetics RNA, Messenger/genetics Embryonic Stem Cells Cells, Cultured Collapse Grants R01 HG008759 NHGRI NIH HHS R01 HG011469 NHGRI NIH HHS R01 GM136886 NIGMS NIH HHS Ministry of Health and Care Services \| Helse Sør-Øst RHF (Southern and Eastern Norway Regional Health Authority) Norges Forskningsråd (Research Council of Norway) UiO:Life Science convergence environment grant Foundation for the National Institutes of Health (Foundation for the National Institutes of Health, Inc.) An institutional fund from the Department of Biomedical Informatics, The Ohio State University, R01HG011469 and R01GM136886 The Danish National Research Foundation grant DNRF115 An institutional fund from the Department of Biomedical Informatics, The Ohio State University Collapse
2	Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550582. [PMID: 37546854 PMCID: PMC10402094 DOI: 10.1101/2023.07.25.550582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023] Abstract The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis. Collapse Key Words Collapse MESH Headings Collapse Grants R01 HG008759 NHGRI NIH HHS R01 GM136886 NIGMS NIH HHS R35 GM138122 NIGMS NIH HHS R35 GM142647 NIGMS NIH HHS U41 HG007234 NHGRI NIH HHS U24 HG007234 NHGRI NIH HHS Wellcome Trust UM1 HG009443 NHGRI NIH HHS R01 HG011469 NHGRI NIH HHS F31 HG010999 NHGRI NIH HHS Collapse
3	The RNA m⁶A landscape of mouse oocytes and preimplantation embryos. Nat Struct Mol Biol 2023;30:703-709. [PMID: 37081317 DOI: 10.1038/s41594-023-00969-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 03/16/2023] [Indexed: 04/22/2023] Abstract Despite the significance of N⁶-methyladenosine (m⁶A) in gene regulation, the requirement for large amounts of RNA has hindered m⁶A profiling in mammalian early embryos. Here we apply low-input methyl RNA immunoprecipitation and sequencing to map m⁶A in mouse oocytes and preimplantation embryos. We define the landscape of m⁶A during the maternal-to-zygotic transition, including stage-specifically expressed transcription factors essential for cell fate determination. Both the maternally inherited transcripts to be degraded post fertilization and the zygotically activated genes during zygotic genome activation are widely marked by m⁶A. In contrast to m⁶A-marked zygotic ally-activated genes, m⁶A-marked maternally inherited transcripts have a higher tendency to be targeted by microRNAs. Moreover, RNAs derived from retrotransposons, such as MTA that is maternally expressed and MERVL that is transcriptionally activated at the two-cell stage, are largely marked by m⁶A. Our results provide a foundation for future studies exploring the regulatory roles of m⁶A in mammalian early embryonic development. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
4	CEDA: integrating gene expression data with CRISPR-pooled screen data identifies essential genes with higher expression. Bioinformatics 2022;38:5245-5252. [PMID: 36250792 PMCID: PMC9710553 DOI: 10.1093/bioinformatics/btac668] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 09/26/2022] [Indexed: 12/24/2022] Open Abstract MOTIVATION Clustered regularly interspaced short palindromic repeats (CRISPR)-based genetic perturbation screen is a powerful tool to probe gene function. However, experimental noises, especially for the lowly expressed genes, need to be accounted for to maintain proper control of false positive rate. METHODS We develop a statistical method, named CRISPR screen with Expression Data Analysis (CEDA), to integrate gene expression profiles and CRISPR screen data for identifying essential genes. CEDA stratifies genes based on expression level and adopts a three-component mixture model for the log-fold change of single-guide RNAs (sgRNAs). Empirical Bayesian prior and expectation-maximization algorithm are used for parameter estimation and false discovery rate inference. RESULTS Taking advantage of gene expression data, CEDA identifies essential genes with higher expression. Compared to existing methods, CEDA shows comparable reliability but higher sensitivity in detecting essential genes with moderate sgRNA fold change. Therefore, using the same CRISPR data, CEDA generates an additional hit gene list. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Bayes Theorem Clustered Regularly Interspaced Short Palindromic Repeats CRISPR-Cas Systems Gene Expression Genes, Essential Reproducibility of Results RNA, Small Untranslated/genetics Collapse Grants R01 HG008759 NHGRI NIH HHS R01 GM136886 NIGMS NIH HHS R01 HG011469 NHGRI NIH HHS P30 CA016058 NCI NIH HHS NIH HHS National Institutes of Health NIH Collapse
5	The blooming of long-read sequencing reforms biomedical research. Genome Biol 2022;23:21. [PMID: 35022055 PMCID: PMC8756655 DOI: 10.1186/s13059-022-02604-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open Abstract Collapse Key Words Collapse MESH Headings Biomedical Research High-Throughput Nucleotide Sequencing Sequence Analysis, DNA Collapse Grants R01 GM136886 NIGMS NIH HHS R01 HG008759 NHGRI NIH HHS R01 HG011469 NHGRI NIH HHS Collapse
6	Real-time mapping of nanopore raw signals. Bioinformatics 2021;37:i477-i483. [PMID: 34252938 PMCID: PMC8336444 DOI: 10.1093/bioinformatics/btab264] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open Abstract Motivation Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. Results In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F₁-score (0.9354 versus 0.8660). Availability and implementation Sigmap code is accessible at https://github.com/haowenz/sigmap. Supplementary information Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
7	ALS-associated mutation FUS-R521C causes DNA damage and RNA splicing defects. J Clin Invest 2021;131:149564. [PMID: 33792567 DOI: 10.1172/jci149564] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants K26 OD010945 NIH HHS K26 RR026099 NCRR NIH HHS R01 NS095894 NINDS NIH HHS Collapse
8	Single-molecule long-read sequencing reveals a conserved intact long RNA profile in sperm. Nat Commun 2021;12:1361. [PMID: 33649327 PMCID: PMC7921563 DOI: 10.1038/s41467-021-21524-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/22/2021] [Indexed: 01/31/2023] Open Abstract Sperm contributes diverse RNAs to the zygote. While sperm small RNAs have been shown to impact offspring phenotypes, our knowledge of the sperm transcriptome, especially the composition of long RNAs, has been limited by the lack of sensitive, high-throughput experimental techniques that can distinguish intact RNAs from fragmented RNAs, known to abound in sperm. Here, we integrate single-molecule long-read sequencing with short-read sequencing to detect sperm intact RNAs (spiRNAs). We identify 3440 spiRNA species in mice and 4100 in humans. The spiRNA profile consists of both mRNAs and long non-coding RNAs, is evolutionarily conserved between mice and humans, and displays an enrichment in mRNAs encoding for ribosome. In sum, we characterize the landscape of intact long RNAs in sperm, paving the way for future studies on their biogenesis and functions. Our experimental and bioinformatics approaches can be applied to other tissues and organisms to detect intact transcripts. Collapse Key Words rna developmental biology Collapse MESH Headings Animals Conserved Sequence/genetics Evolution, Molecular Gene Ontology High-Throughput Nucleotide Sequencing/methods Humans Male Mice, Inbred C57BL RNA/genetics RNA/metabolism RNA, Long Noncoding/genetics RNA, Long Noncoding/metabolism RNA, Messenger/genetics RNA, Messenger/metabolism Ribosomes/metabolism Single Molecule Imaging Spermatozoa/metabolism Testis/metabolism Transcriptome/genetics Mice Collapse Grants P30 ES001247 NIEHS NIH HHS R00 HD078482 NICHD NIH HHS R01 HG008759 NHGRI NIH HHS R35 GM128782 NIGMS NIH HHS U.S. Department of Health & Human Services \| NIH \| National Human Genome Research Institute (NHGRI) U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) U.S. Department of Health & Human Services \| NIH \| National Institute of Environmental Health Sciences (NIEHS) Collapse
9	Revealing tumor heterogeneity of breast cancer by utilizing the linkage between somatic and germline mutations. Brief Bioinform 2020;20:2306-2315. [PMID: 30239581 PMCID: PMC6954402 DOI: 10.1093/bib/bby084] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Revised: 06/07/2018] [Accepted: 06/26/2018] [Indexed: 12/25/2022] Open Abstract The intra-tumor heterogeneity is associated with cancer progression and therapeutic resistance, such as in breast cancer. While the existing methods for studying tumor heterogeneity only analyze variant allele frequency (VAF), the genotype of variant is also informative for inferring subclones, which can be detected by long reads or paired-end reads. We developed GenoClone to integrate VAF with the genotype of variant innovatively, so it showed superior performance of inferring the number of subclones, estimating the fractions of subclones and identifying somatic single-nucleotide variants composition of subclones. When GenoClone was applied to 389 TCGA breast cancer samples, it revealed extensive intra-tumor heterogeneity. We further found that a few somatic mutations were relevant to the late stage of tumor evolution, including the ones at the oncogene PIK3CA and the tumor suppress gene TP53. Moreover, 52 subclones that were identified from 167 samples shared high similarity of somatic mutations, which were clustered into three groups with the sizes of 24, 14 and 14. It is helpful for understanding the development of breast cancer in certain subgroups of people and the drug development for population level. Furthermore, GenoClone also identified the tumor heterogeneity in different aliquots of the same samples. The implementation of GenoClone is available at http://www.healthcare.uiowa.edu/labs/au/GenoClone/. Collapse Key Words VAF germline mutation somatic mutation subclone inference tumor heterogeneity Collapse MESH Headings Collapse Grants Collapse
10	Abstract 830: New bioinformatics workflow of genome-wide CRISPR-Cas9 knockout screens. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-830] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Introduction: Genome-wide CRISPR-Cas9 based loss-of-function screens can be used to find essential genes for proliferation and survival of cancer cells. While recent studies have focused on establishing reference sets of essential and non-essential genes, correcting copy number effect and characterizing off-target effect, it lacks in-depth studies of the effects of gene abundance and sgRNAs that targeting multi-genomic loci. To fill this gap timely, we here present a bioinformatics workflow to reduce false positives in CRISPR-Cas9 screens. Description: Gastric adenocarcinoma cell line AGS was infected with CRISPR knockout library (TKOv3) at a multiplicity of infection of 0.3~0.4. We used the cells right after puromycin selection as the baseline sample, and the cells cultured for 14 days or 20 days as the negative selection samples. The sgRNA inserts were amplified by PCR and the corresponding libraries were sequenced on NextSeq 500 with a single-end 75 bp run, followed by analysis by MAGeCK. The read counts of sgRNAs were normalized by non-essential genes to reduce false positives. The RNA-seq data and copy number data were obtained by CCLE portal. To characterize sgRNAs targeting multiple-genomic loci, Bowtie was used to align sgRNA to the reference human genome (GRCh38) with no mismatch, and only the alignments followed by NGG PAM site were remained for downstream analysis. Summary: Integration of RNA-seq data with CRISPR negative screen results showed that the selection signal was noisy for the lowly expressed genes. The fraction of selected essential genes (overall FDR<0.05, absolute value of beta score >1) was as low as 0.11% among the genes with the bottom 10% expression level, while 27% among the genes with the top 10% expression level. After filtering out the lowly expressed genes (<0.06 RPKM), the selected essential genes had an FDR much closer to 0. Out of the 40 essential genes selected without filtering out lowly expressed genes, none of them was reported oncogenes in literature. To study the influences of multiple alignments of sgRNAs, we only considered the ones with perfect alignments (i.e., no mismatch) so that we can prevent it from being confounding with off-target effects caused by mismatch tolerance. Log fold changes in read counts were calculated for each sgRNA between a later time point (day 14 or 20) vs. baseline (day 0). The median log fold change significantly decreased as a function of the number of perfect alignments (p = 0.0001, Jonckheere trend test). This supports the hypothesis that a sgRNA aligned to several DNA targets will introduce multiple double stranded cuts, and thus will result in biased essentiality scores. Conclusions: Filtering out lowly-expressed genes prior to CRISPR screen data analysis can reduce false positives. In addition, multiple-target sgRNAs can lead to false positives but the effect needs further analysis in a case by case manner. Citation Format: Yue Zhao, Xue Wu, Yuru Wang, Kin Fai Au, Lijun Cheng, Lang Li. New bioinformatics workflow of genome-wide CRISPR-Cas9 knockout screens [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 830. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
11	A network-based computational framework to predict and differentiate functions for gene isoforms using exon-level expression data. Methods 2020;189:54-64. [PMID: 32534132 DOI: 10.1016/j.ymeth.2020.06.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 05/22/2020] [Accepted: 06/06/2020] [Indexed: 12/23/2022] Open Abstract MOTIVATION Alternative splicing makes significant contributions to functional diversity of transcripts and proteins. Many alternatively spliced gene isoforms have been shown to perform specific biological functions under different contexts. In addition to gene-level expression, the advances of high-throughput sequencing offer a chance to estimate isoform-specific exon expression with a high resolution, which is informative for studying splice variants with network analysis. RESULTS In this study, we propose a novel network-based analysis framework to predict isoform-specific functions from exon-level RNA-Seq data. In particular, based on exon-level expression data, we firstly propose a unified framework, referred to as Iso-Net, to integrate two new mathematical methods (named MINet and RVNet) that infer co-expression networks at different data scenarios. We demonstrate the superior prediction accuracy of Iso-Net over the existing methods for most simulation data, especially in two extreme cases: sample size is very small and exon numbers of two isoforms are quite different. Furthermore, by defining relevant quantitative measures (e.g., Jaccard correlation coefficient) and combining differential co-expression network analysis and GO functional enrichment analysis, a co-expression network analysis framework is developed to predict functions of isoforms and further, to discover their distinct functions within the same gene. We apply Iso-Net to study gene isoforms for several important transcription factors in human myeloid differentiation with the exon-level RNA-Seq data from three different cell lines. AVAILABILITY AND IMPLEMENTATION Iso-Net is open source and freely available from https://github.com/Dingjie-Wang/Iso-Net. Collapse Key Words Alternative splicing Co-expression networks Exon-level RNA-Seq Gene isoforms Matrix Correlation Collapse MESH Headings Collapse Grants Collapse
12	Performance difference of graph-based and alignment-based hybrid error correction methods for error-prone long reads. Genome Biol 2020;21:14. [PMID: 31952552 PMCID: PMC6966875 DOI: 10.1186/s13059-019-1885-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 11/10/2019] [Indexed: 11/10/2022] Open Abstract The error-prone third-generation sequencing (TGS) long reads can be corrected by the high-quality second-generation sequencing (SGS) short reads, which is referred to as hybrid error correction. We here investigate the influences of the principal algorithmic factors of two major types of hybrid error correction methods by mathematical modeling and analysis on both simulated and real data. Our study reveals the distribution of accuracy gain with respect to the original long read error rate. We also demonstrate that the original error rate of 19% is the limit for perfect correction, beyond which long reads are too error-prone to be corrected by these methods. Collapse Key Words Collapse MESH Headings Algorithms High-Throughput Nucleotide Sequencing/methods Sequence Alignment Collapse Grants R01 HG008759 NHGRI NIH HHS R01HG008759 NHGRI NIH HHS National Human Genome Research Institute Department of Internal Medicine, University of Iowa Department of Biomedical Informatics, The Ohio State University Collapse
13	iASPP mediates p53 selectivity through a modular mechanism fine-tuning DNA recognition. Proc Natl Acad Sci U S A 2019;116:17470-17479. [PMID: 31395738 PMCID: PMC6717262 DOI: 10.1073/pnas.1909393116] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open Abstract The most frequently mutated protein in human cancer is p53, a transcription factor (TF) that regulates myriad genes instrumental in diverse cellular outcomes including growth arrest and cell death. Cell context-dependent p53 modulation is critical for this life-or-death balance, yet remains incompletely understood. Here we identify sequence signatures enriched in genomic p53-binding sites modulated by the transcription cofactor iASPP. Moreover, our p53-iASPP crystal structure reveals that iASPP displaces the p53 L1 loop-which mediates sequence-specific interactions with the signature-corresponding base-without perturbing other DNA-recognizing modules of the p53 DNA-binding domain. A TF commonly uses multiple structural modules to recognize its cognate DNA, and thus this mechanism of a cofactor fine-tuning TF-DNA interactions through targeting a particular module is likely widespread. Previously, all tumor suppressors and oncoproteins that associate with the p53 DNA-binding domain-except the oncogenic E6 from human papillomaviruses (HPVs)-structurally cluster at the DNA-binding site of p53, complicating drug design. By contrast, iASPP inhibits p53 through a distinct surface overlapping the E6 footprint, opening prospects for p53-targeting precision medicine to improve cancer therapy. Collapse Key Words HPV E6 crystal structure iASPP p53 target selectivity Collapse MESH Headings Base Sequence Binding Sites Cell Line, Tumor DNA/chemistry DNA/genetics DNA/metabolism Gene Expression Profiling Humans Intracellular Signaling Peptides and Proteins/chemistry Intracellular Signaling Peptides and Proteins/metabolism Models, Molecular Nucleotide Motifs Oncogene Proteins, Viral/chemistry Oncogene Proteins, Viral/metabolism Protein Binding Protein Conformation Repressor Proteins/chemistry Repressor Proteins/metabolism Response Elements Structure-Activity Relationship Tumor Suppressor Protein p53/chemistry Tumor Suppressor Protein p53/metabolism Collapse Grants R01 HG008759 NHGRI NIH HHS C375/A17721 Cancer Research UK 17721 Cancer Research UK 26752 Cancer Research UK 14414 Cancer Research UK P30 CA086862 NCI NIH HHS Department of Health C20724/A26752 Cancer Research UK 203141/Z/16/Z Wellcome Trust Collapse
14	Single-molecule long-read sequencing reveals the chromatin basis of gene expression. Genome Res 2019;29:1329-1342. [PMID: 31201211 PMCID: PMC6673713 DOI: 10.1101/gr.251116.119] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 06/10/2019] [Indexed: 11/25/2022] Abstract Genome-wide chromatin accessibility and nucleosome occupancy profiles have been widely investigated, while the long-range dynamics remain poorly studied at the single-cell level. Here, we present a new experimental approach, methyltransferase treatment followed by single-molecule long-read sequencing (MeSMLR-seq), for long-range mapping of nucleosomes and chromatin accessibility at single DNA molecules and thus achieve comprehensive-coverage characterization of the corresponding heterogeneity. MeSMLR-seq offers direct measurements of both nucleosome-occupied and nucleosome-evicted regions on a single DNA molecule, which is challenging for many existing methods. We applied MeSMLR-seq to haploid yeast, where single DNA molecules represent single cells, and thus we could investigate the combinatorics of many (up to 356) nucleosomes at long range in single cells. We illustrated the differential organization principles of nucleosomes surrounding the transcription start site for silent and actively transcribed genes, at the single-cell level and in the long-range scale. The heterogeneous patterns of chromatin status spanning multiple genes were phased. Together with single-cell RNA-seq data, we quantitatively revealed how chromatin accessibility correlated with gene transcription positively in a highly heterogeneous scenario. Moreover, we quantified the openness of promoters and investigated the coupled chromatin changes of adjacent genes at single DNA molecules during transcription reprogramming. In addition, we revealed the coupled changes of chromatin accessibility for two neighboring glucose transporter genes in response to changes in glucose concentration. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
15	A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol 2019;20:26. [PMID: 30717772 PMCID: PMC6362602 DOI: 10.1186/s13059-018-1605-z] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 12/05/2018] [Indexed: 12/20/2022] Open Abstract Background Third-generation sequencing technologies have advanced the progress of the biological research by generating reads that are substantially longer than second-generation sequencing technologies. However, their notorious high error rate impedes straightforward data analysis and limits their application. A handful of error correction methods for these error-prone long reads have been developed to date. The output data quality is very important for downstream analysis, whereas computing resources could limit the utility of some computing-intense tools. There is a lack of standardized assessments for these long-read error-correction methods. Results Here, we present a comparative performance assessment of ten state-of-the-art error-correction methods for long reads. We established a common set of benchmarks for performance assessment, including sensitivity, accuracy, output rate, alignment rate, output read length, run time, and memory usage, as well as the effects of error correction on two downstream applications of long reads: de novo assembly and resolving haplotype sequences. Conclusions Taking into account all of these metrics, we provide a suggestive guideline for method choice based on available data size, computing resources, and individual research goals. Electronic supplementary material The online version of this article (10.1186/s13059-018-1605-z) contains supplementary material, which is available to authorized users. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
16	E-C coupling structural protein junctophilin-2 encodes a stress-adaptive transcription regulator. Science 2018;362:science.aan3303. [PMID: 30409805 DOI: 10.1126/science.aan3303] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2017] [Revised: 05/10/2018] [Accepted: 10/24/2018] [Indexed: 11/02/2022] Abstract Junctophilin-2 (JP2) is a structural protein required for normal excitation-contraction (E-C) coupling. After cardiac stress, JP2 is cleaved by the calcium ion-dependent protease calpain, which disrupts the E-C coupling ultrastructural machinery and drives heart failure progression. We found that stress-induced proteolysis of JP2 liberates an N-terminal fragment (JP2NT) that translocates to the nucleus, binds to genomic DNA, and controls expression of a spectrum of genes in cardiomyocytes. Transgenic overexpression of JP2NT in mice modifies the transcriptional profile, resulting in attenuated pathological remodeling in response to cardiac stress. Conversely, loss of nuclear JP2NT function accelerates stress-induced development of hypertrophy and heart failure in mutant mice. These data reveal a self-protective mechanism in failing cardiomyocytes that transduce mechanical information (E-C uncoupling) into salutary transcriptional reprogramming in the stressed heart. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
17	A Statistical Method for Observing Personal Diploid Methylomes and Transcriptomes with Single-Molecule Real-Time Sequencing. Genes (Basel) 2018;9:E460. [PMID: 30235838 PMCID: PMC6162384 DOI: 10.3390/genes9090460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/12/2018] [Accepted: 09/12/2018] [Indexed: 11/16/2022] Open Abstract We address the problem of observing personal diploid methylomes, CpG methylome pairs of homologous chromosomes that are distinguishable with respect to phased heterozygous variants (PHVs), which is challenging due to scarcity of PHVs in personal genomes. Single molecule real-time (SMRT) sequencing is promising as it outputs long reads with CpG methylation information, but a serious concern is whether reliable PHVs are available in erroneous SMRT reads with an error rate of ∼15%. To overcome the issue, we propose a statistical model that reduces the error rate of phasing CpG site to 1%, thereby calling CpG hypomethylation in each haplotype with >90% precision and sensitivity. Using our statistical model, we examined GNAS complex locus known for a combination of maternally, paternally, or biallelically expressed isoforms, and observed allele-specific methylation pattern almost perfectly reflecting their respective allele-specific expression status, demonstrating the merit of elucidating comprehensive personal diploid methylomes and transcriptomes. Collapse Key Words DNA methylation allele-specific analysis gene expression single molecule real-time sequencing statistical methods Collapse MESH Headings Collapse Grants R01 HG008759 NHGRI NIH HHS GRIFIN Japan Agency for Medical Research and Development Grant-in-Aid for JSPS Fellows 15J03645 Japan Society for the Promotion of Science Dept. of Internal Medicine, Institutional Fund University of Iowa Research Starter Grant (Informatics) Pharmaceutical Research and Manufacturers of America CREST JPMJCR13W3 Japan Science and Technology Agency R01HG008759 National Human Genome Research Institute Collapse
18	Delayed diagnosis of tuberculosis: risk factors and effect on mortality among older adults in Hong Kong. Hong Kong Med J 2018;24:361-368. [PMID: 30065120 DOI: 10.12809/hkmj177081] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open Abstract OBJECTIVE To assess the risk factors and effects of delayed diagnosis on tuberculosis (TB) mortality in Hong Kong. METHODS All consecutive patients with TB notified in 2010 were tracked through their clinical records for treatment outcome until 2012. All TB cases notified or confirmed after death were identified for a mortality survey on the timing and causes of death. RESULTS Of 5092 TB cases notified, 1061 (20.9%) died within 2 years of notification; 211 (4.1%) patients died before notification, 683 (13.4%) died within the first year, and 167 (3.3%) died within the second year after notification. Among the 211 cases with TB notified after death, only 30 were certified to have died from TB. However, 52 (24.6%) died from unspecified pneumonia/sepsis possibly related to pulmonary TB. If these cases are counted, the total TB-related deaths increases from 191 to 243. In 82 (33.7%) of these, TB was notified after death. Over 60% of cases in which TB diagnosed after death involved patients aged ≥80 years and a similar proportion had an advance care directive against resuscitation or investigation. Independent factors for TB notified after death included female sex, living in an old age home, drug abuse, malignancy other than lung cancer, sputum TB smear negative, sputum TB culture positive, and chest X-ray not done. CONCLUSIONS High mortality was observed among patients with TB aged ≥80 years. Increased vigilance is warranted to avoid delayed diagnosis and reduce the transmission risk, especially among elderly patients with co-morbidities living in old age homes. Collapse Key Words Aged Time factors Tuberculosis, pulmonary/diagnosis Collapse MESH Headings Collapse Grants Collapse
19	IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing. Bioinformatics 2018;34:2168-2176. [PMID: 29905763 PMCID: PMC6022631 DOI: 10.1093/bioinformatics/bty098] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 02/10/2018] [Accepted: 02/21/2018] [Indexed: 12/24/2022] Open Abstract Motivation In the past years, the long read (LR) sequencing technologies, such as Pacific Biosciences and Oxford Nanopore Technologies, have been demonstrated to substantially improve the quality of genome assembly and transcriptome characterization. Compared to the high cost of genome assembly by LR sequencing, it is more affordable to generate LRs for transcriptome characterization. That is, when informative transcriptome LR data are available without a high-quality genome, a method for de novo transcriptome assembly and annotation is of high demand. Results Without a reference genome, IDP-denovo performs de novo transcriptome assembly, isoform annotation and quantification by integrating the strengths of LRs and short reads. Using the GM12878 human data as a gold standard, we demonstrated that IDP-denovo had superior sensitivity of transcript assembly and high accuracy of isoform annotation. In addition, IDP-denovo outputs two abundance indices to provide a comprehensive expression profile of genes/isoforms. IDP-denovo represents a robust approach for transcriptome assembly, isoform annotation and quantification for non-model organism studies. Applying IDP-denovo to a non-model organism, Dendrobium officinale, we discovered a number of novel genes and novel isoforms that were not reported by the existing annotation library. These results reveal the high diversity of gene isoforms in D.officinale, which was not reported in the existing annotation library. Availability and implementation The dataset of Dendrobium officinale used/analyzed during the current study has been deposited in SRA, with accession code SRP094520. IDP-denovo is available for download at www.healthcare.uiowa.edu/labs/au/IDP-denovo/. Supplementary information Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Alternative Splicing Dendrobium/genetics Gene Expression Profiling/methods Gene Library High-Throughput Nucleotide Sequencing/methods Humans Sequence Analysis, RNA/methods Collapse Grants P30 CA086862 NCI NIH HHS R01 HG008759 NHGRI NIH HHS Collapse
20	Single cell expression analysis of primate-specific retroviruses-derived HPAT lincRNAs in viable human blastocysts identifies embryonic cells co-expressing genetic markers of multiple lineages. Heliyon 2018;4:e00667. [PMID: 30003161 PMCID: PMC6039856 DOI: 10.1016/j.heliyon.2018.e00667] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 02/01/2018] [Accepted: 06/21/2018] [Indexed: 12/03/2022] Open Abstract Chromosome instability and aneuploidies occur very frequently in human embryos, impairing proper embryogenesis and leading to cell cycle arrest, loss of cell viability, and developmental failures in 50–80% of cleavage-stage embryos. This high frequency of cellular extinction events represents a significant experimental obstacle challenging analyses of individual cells isolated from human preimplantation embryos. We carried out single cell expression profiling of 241 individual cells recovered from 32 human embryos during the early and late stages of viable human blastocyst (VHB) differentiation. Classification of embryonic cells was performed solely based on expression patterns of human pluripotency-associated transcripts (HPAT), which represent a family of primate-specific transposable element-derived lincRNAs highly expressed in human embryonic stem cells and regulating nuclear reprogramming and pluripotency induction. We then validated our findings by analyzing transcriptomes of 1,708 individual cells recovered from more than 100 human embryos and 259 mouse cells from more than 40 mouse embryos at different stages of preimplantation embryogenesis. HPAT's expression-guided spatiotemporal reconstruction of human embryonic development inferred from single-cell expression analysis of VHB differentiation enabled identification of telomerase-positive embryonic cells co-expressing key pluripotency regulatory genes and genetic markers of three major lineages. Follow-up validation analyses confirmed the emergence in human embryos prior to lineage segregation of telomerase-positive cells co-expressing genetic markers of multiple lineages. Observations reported in this contribution support the hypothesis of a developmental pathway of creation embryonic lineages and extraembryonic tissues from telomerase-positive pre-lineage cells manifesting multi-lineage precursor phenotype. Collapse Key Words Developmental biology Collapse MESH Headings Collapse Grants Collapse
21	Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale. Genes (Basel) 2017;8:genes8100257. [PMID: 28981454 PMCID: PMC5664107 DOI: 10.3390/genes8100257] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 09/16/2017] [Accepted: 10/02/2017] [Indexed: 11/16/2022] Open Abstract Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET) and one sucrose transporter (SUT) are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT) and four cellulose synthase (Ces) genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF) genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem. Collapse Key Words Dendrobium officinale SGS SMRT alternative splicing polysaccharide second-generation sequence single-molecule real-time sequence sugar transporter Collapse MESH Headings Collapse Grants P50 CA097274 NCI NIH HHS R01 HG008759 NHGRI NIH HHS Collapse
22	CF airway smooth muscle transcriptome reveals a role for PYK2. JCI Insight 2017;2:95332. [PMID: 28878137 DOI: 10.1172/jci.insight.95332] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 07/27/2017] [Indexed: 12/17/2022] Open Abstract Abnormal airway smooth muscle function can contribute to cystic fibrosis (CF) airway disease. We previously found that airway smooth muscle from newborn CF pigs had increased basal tone, an increased bronchodilator response, and abnormal calcium handling. Since CF pigs lack airway infection and inflammation at birth, these findings suggest intrinsic airway smooth muscle dysfunction in CF. In this study, we tested the hypothesis that CFTR loss in airway smooth muscle would produce a distinct set of changes in the airway smooth muscle transcriptome that we could use to develop novel therapeutic targets. Total RNA sequencing of newborn wild-type and CF airway smooth muscle revealed changes in muscle contraction-related genes, ontologies, and pathways. Using connectivity mapping, we identified several small molecules that elicit transcriptional signatures opposite of CF airway smooth muscle, including NVP-TAE684, an inhibitor of proline-rich tyrosine kinase 2 (PYK2). In CF airway smooth muscle tissue, PYK2 phosphorylation was increased and PYK2 inhibition decreased smooth muscle contraction. In vivo NVP-TAE684 treatment of wild-type mice reduced methacholine-induced airway smooth muscle contraction. These findings suggest that studies in the newborn CF pig may provide an important approach to enhance our understanding of airway smooth muscle biology and for discovery of novel airway smooth muscle therapeutics for CF and other diseases of airway hyperreactivity. Collapse Key Words Asthma Genetic diseases Ion channels Pulmonology Collapse MESH Headings Collapse Grants Collapse
23	Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis. Nat Commun 2017;8:59. [PMID: 28680106 PMCID: PMC5498581 DOI: 10.1038/s41467-017-00050-4] [Citation(s) in RCA: 165] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 05/02/2017] [Indexed: 12/30/2022] Open Abstract RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome. RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
24	Discovery of novel determinants of endothelial lineage using chimeric heterokaryons. eLife 2017;6. [PMID: 28323620 PMCID: PMC5391207 DOI: 10.7554/elife.23588] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 03/17/2017] [Indexed: 12/29/2022] Open Abstract We wish to identify determinants of endothelial lineage. Murine embryonic stem cells (mESC) were fused with human endothelial cells in stable, non-dividing, heterokaryons. Using RNA-seq, it is possible to discriminate between human and mouse transcripts in these chimeric heterokaryons. We observed a temporal pattern of gene expression in the ESCs of the heterokaryons that recapitulated ontogeny, with early mesodermal factors being expressed before mature endothelial genes. A set of transcriptional factors not known to be involved in endothelial development was upregulated, one of which was POU class 3 homeobox 2 (Pou3f2). We confirmed its importance in differentiation to endothelial lineage via loss- and gain-of-function (LOF and GOF). Its role in vascular development was validated in zebrafish embryos using morpholino oligonucleotides. These studies provide a systematic and mechanistic approach for identifying key regulators in directed differentiation of pluripotent stem cells to somatic cell lineages. DOI:http://dx.doi.org/10.7554/eLife.23588.001 Endothelial cells form the inner surface of blood vessels, acting like a non-stick coating. In addition to making substances that keep blood from sticking to the vessel wall, endothelial cells generate compounds that relax the vessel, and prevent it from thickening. Endothelial cells also form capillaries, the smallest vessels that provide oxygen and nutrients for all tissues. A regenerating organ, or a bioengineered tissue, requires a system of capillaries and other microvessels. Thus, regenerative medicine could benefit from a knowledge of how to generate endothelial cells from pluripotent stem cells – cells that can “differentiate” to form almost any type of cell in the body. Wong, Matrone et al. have now used a cell fusion model (named heterokaryon) to track the changes in gene expression that occur as a pluripotent stem cell differentiates to ultimately become an endothelial cell. In this model, mouse embryonic stem cells (ESCs) are fused to human endothelial cells. Over time the human endothelial cells drive gene expression in the ESCs toward that of endothelial cells. Wong, Matrone et al. discovered changes in gene expression in many genes that have not previously been described as involved in the differentiation of endothelial cells. When one of these genes – named Pou3f2 – was inactivated in ESCs, they could not be differentiated into endothelial cells. The absence of Pou3f2 also drastically impaired how blood vessels developed in zebrafish embryos. Thus the heterokaryon model can generate important information regarding the dynamic changes in gene expression that occur as a pluripotent cell differentiates to become an endothelial cell. This model may also be useful for discovering other genes that control the differentiation of other cell types. DOI:http://dx.doi.org/10.7554/eLife.23588.002 Collapse Key Words Endothelial lineage Heterokaryons cell biology developmental biology human mouse nuclear reprogramming pou3f2 stem cells zebrafish Collapse MESH Headings Collapse Grants Collapse
25	IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res 2017;45:e32. [PMID: 27899656 PMCID: PMC5952581 DOI: 10.1093/nar/gkw1076] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 10/20/2016] [Accepted: 10/26/2016] [Indexed: 12/14/2022] Open Abstract Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only. Collapse Key Words Collapse MESH Headings Alleles Gene Expression Regulation Haplotypes High-Throughput Nucleotide Sequencing/methods Human Embryonic Stem Cells/cytology Human Embryonic Stem Cells/metabolism Humans MCF-7 Cells RNA Isoforms/genetics RNA Isoforms/metabolism RNA, Messenger/genetics RNA, Messenger/metabolism Sequence Analysis, RNA Transcriptome Collapse Grants R01 HG008759 NHGRI NIH HHS T32 HL007638 NHLBI NIH HHS Collapse
26	Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 2017;6:100. [PMID: 28868132 PMCID: PMC5553090 DOI: 10.12688/f1000research.10571.1] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/09/2017] [Indexed: 09/05/2023] Open Abstract Background: Given the demonstrated utility of Third Generation Sequencing [Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)] long reads in many studies, a comprehensive analysis and comparison of their data quality and applications is in high demand. Methods: Based on the transcriptome sequencing data from human embryonic stem cells, we analyzed multiple data features of PacBio and ONT, including error pattern, length, mappability and technical improvements over previous platforms. We also evaluated their application to transcriptome analyses, such as isoform identification and quantification and characterization of transcriptome complexity, by comparing the performance of size-selected PacBio, non-size-selected ONT and their corresponding Hybrid-Seq strategies (PacBio+Illumina and ONT+Illumina). Results: PacBio shows overall better data quality, while ONT provides a higher yield. As with data quality, PacBio performs marginally better than ONT in most aspects for both long reads only and Hybrid-Seq strategies in transcriptome analysis. In addition, Hybrid-Seq shows superior performance over long reads only in most transcriptome analyses. Conclusions: Both PacBio and ONT sequencing are suitable for full-length single-molecule transcriptome analysis. As this first use of ONT reads in a Hybrid-Seq analysis has shown, both PacBio and ONT can benefit from a combined Illumina strategy. The tools and analytical methods developed here provide a resource for future applications and evaluations of these rapidly-changing technologies. Collapse Key Words Oxford Nanopore Technologies PacBio Third Generation Sequencing Transcriptome Collapse MESH Headings Collapse Grants R01 HG008759 NHGRI NIH HHS T32 HL007638 NHLBI NIH HHS Collapse
27	Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 2017;6:100. [PMID: 28868132 PMCID: PMC5553090 DOI: 10.12688/f1000research.10571.2] [Citation(s) in RCA: 234] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/09/2017] [Indexed: 12/11/2022] Open Abstract Background: Given the demonstrated utility of Third Generation Sequencing [Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)] long reads in many studies, a comprehensive analysis and comparison of their data quality and applications is in high demand. Methods: Based on the transcriptome sequencing data from human embryonic stem cells, we analyzed multiple data features of PacBio and ONT, including error pattern, length, mappability and technical improvements over previous platforms. We also evaluated their application to transcriptome analyses, such as isoform identification and quantification and characterization of transcriptome complexity, by comparing the performance of size-selected PacBio, non-size-selected ONT and their corresponding Hybrid-Seq strategies (PacBio+Illumina and ONT+Illumina). Results: PacBio shows overall better data quality, while ONT provides a higher yield. As with data quality, PacBio performs marginally better than ONT in most aspects for both long reads only and Hybrid-Seq strategies in transcriptome analysis. In addition, Hybrid-Seq shows superior performance over long reads only in most transcriptome analyses. Conclusions: Both PacBio and ONT sequencing are suitable for full-length single-molecule transcriptome analysis. As this first use of ONT reads in a Hybrid-Seq analysis has shown, both PacBio and ONT can benefit from a combined Illumina strategy. The tools and analytical methods developed here provide a resource for future applications and evaluations of these rapidly-changing technologies. Collapse Key Words Oxford Nanopore Technologies PacBio Third Generation Sequencing Transcriptome Collapse MESH Headings Collapse Grants Collapse
28	Abstract 5289: Enhance both precision and sensitivity of fusion gene detection by hybrid sequencing. Cancer Res 2016. [DOI: 10.1158/1538-7445.am2016-5289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Background: New Third Generation Sequencing (TGS) techniques, such as PacBio, can provide very informative insights into the transcriptome, such as expression of fusion genes/fusion transcripts from cancer samples. However, the currently available fusion genes analysis tools are for Second Generation Sequencing (SGS) data, where the short read length and unreliable alignments can lead to uncertain accuracy of fusion gene detections. Hybrid-Seq, which integrates SGS short read data into the analysis of TGS long read data, can complement the strengths of both and thus improve the overall performance and resolution of the output data. It also reduces the required amount of TGS data and thus the sequencing cost. Recently, we developed and reported on a Hybrid-Seq approach, IDP-fusion and the results of the proof-of-concept application to MCF-7 data. We demonstrated that IDP-fusion can identify fusions genes with much higher precision than SGS-based approaches. Although the sensitivity is comparable to the most sensitive SGS-only method, a significant proportion of experimentally verified gold standard fusion genes had yet to be identified by IDP-fusion. It indicated an opportunity to enhance the sensitivity of IDP-fusion while retaining the unparalleled precision of the results. Method: Here we present an innovative Hybrid-Seq approach which extends IDP-fusion, to filtered fusion gene candidates predicted by SGS short read alignments. The fusion gene candidates are verified by the presence of a TGS long read better aligned to an artificial chromosome created from the fusion candidate than any single genome locus. We applied IDP-fusion to a Hybrid-Seq data from the MCF-7 breast cancer cell line, including Illumina SGS data and a lately TGS data generated by PacBio P5-C3 sequencing chemistry. The new IDP-fusion considered fusion candidates reported by several popular SGS tools (TopHat-Fusion, SOAPfuse, TRUP, FusionMap, and deFuse). We compared performance of our new tool to the original IDP-fusion, and the SGS-only approaches. Results: The new algorithm of IDP-fusion improved the sensitivity from 33.8% to 54.9%. This is higher than the most sensitive SGS-only tool (deFuse, 38.0%), which is achieved at the cost of a low precision of 13.8%. The improved IDP-fusion retains a precision of 60.9%, which is only down slightly from the original IDP-fusion at 68.6%. This tradeoff is acceptable when considering the overall accuracy described by F-score for IDP-fusion. The F-score has increased from 45.3% to 57.8%, which is also considerably better than the best F-score achieved by SGS-only methods (32.8%). Conclusions: Fusion candidates identified directly from SGS reads can be screened using alignments of TGS long reads, and supplement fusion candidates detected from long reads. Comparing to SGS-only methods, this Hybrid-Seq approach provides much more sensitive and more accurate reports on the fusion genes. Citation Format: Jason L. Weirather, Tyson A. Clark, Elizabeth Tseng, Jonas Korlach, Kin Fai Au. Enhance both precision and sensitivity of fusion gene detection by hybrid sequencing. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 5289. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
29	The primate-specific noncoding RNA HPAT5 regulates pluripotency during human preimplantation development and nuclear reprogramming. Nat Genet 2015;48:44-52. [PMID: 26595768 DOI: 10.1038/ng.3449] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 10/22/2015] [Indexed: 12/14/2022] Abstract Long intergenic noncoding RNAs (lincRNAs) are derived from thousands of loci in mammalian genomes and are frequently enriched in transposable elements (TEs). Although families of TE-derived lincRNAs have recently been implicated in the regulation of pluripotency, little is known of the specific functions of individual family members. Here we characterize three new individual TE-derived human lincRNAs, human pluripotency-associated transcripts 2, 3 and 5 (HPAT2, HPAT3 and HPAT5). Loss-of-function experiments indicate that HPAT2, HPAT3 and HPAT5 function in preimplantation embryo development to modulate the acquisition of pluripotency and the formation of the inner cell mass. CRISPR-mediated disruption of the genes for these lincRNAs in pluripotent stem cells, followed by whole-transcriptome analysis, identifies HPAT5 as a key component of the pluripotency network. Protein binding and reporter-based assays further demonstrate that HPAT5 interacts with the let-7 microRNA family. Our results indicate that unique individual members of large primate-specific lincRNA families modulate gene expression during development and differentiation to reinforce cell fate. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	PacBio Sequencing and Its Applications. GENOMICS PROTEOMICS & BIOINFORMATICS 2015;13:278-89. [PMID: 26542840 PMCID: PMC4678779 DOI: 10.1016/j.gpb.2015.08.002] [Citation(s) in RCA: 1130] [Impact Index Per Article: 125.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/06/2015] [Accepted: 08/11/2015] [Indexed: 12/15/2022] Abstract Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone. Collapse Key Words De novo assembly Gene isoform detection Hybrid sequencing Methylation Third-generation sequencing Collapse MESH Headings Collapse Grants Collapse
31	Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing. Nucleic Acids Res 2015;43:e116. [PMID: 26040699 PMCID: PMC4605286 DOI: 10.1093/nar/gkv562] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 05/15/2015] [Indexed: 12/19/2022] Open Abstract We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
32	Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2015;82:951-961. [PMID: 25912611 DOI: 10.1111/tpj.12865] [Citation(s) in RCA: 217] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 04/19/2015] [Accepted: 04/21/2015] [Indexed: 05/20/2023] Abstract Danshen, Salvia miltiorrhiza Bunge, is one of the most widely used herbs in traditional Chinese medicine, wherein its rhizome/roots are particularly valued. The corresponding bioactive components include the tanshinone diterpenoids, the biosynthesis of which is a subject of considerable interest. Previous investigations of the S. miltiorrhiza transcriptome have relied on short-read next-generation sequencing (NGS) technology, and the vast majority of the resulting isotigs do not represent full-length cDNA sequences. Moreover, these efforts have been targeted at either whole plants or hairy root cultures. Here, we demonstrate that the tanshinone pigments are produced and accumulate in the root periderm, and apply a combination of NGS and single-molecule real-time (SMRT) sequencing to various root tissues, particularly including the periderm, to provide a more complete view of the S. miltiorrhiza transcriptome, with further insight into tanshinone biosynthesis as well. In addition, the use of SMRT long-read sequencing offered the ability to examine alternative splicing, which was found to occur in approximately 40% of the detected gene loci, including several involved in isoprenoid/terpenoid metabolism. Collapse Key Words Salvia miltiorrhiza alternative splicing next-generation sequencing single-molecule real-time sequencing tanshinone biosynthesis Collapse MESH Headings Abietanes/biosynthesis Abietanes/metabolism Alternative Splicing Gene Expression Profiling/methods Gene Expression Regulation, Plant High-Throughput Nucleotide Sequencing/methods Plant Proteins/genetics Plant Proteins/metabolism Plant Roots/genetics Plant Roots/metabolism Salvia miltiorrhiza/genetics Salvia miltiorrhiza/metabolism Sequence Analysis, DNA/methods Transcriptome Collapse Grants Collapse
33	The transcriptome of human pluripotent stem cells. Curr Opin Genet Dev 2014;28:71-7. [DOI: 10.1016/j.gde.2014.09.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Revised: 09/29/2014] [Accepted: 09/30/2014] [Indexed: 12/11/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
34	ALS-associated mutation FUS-R521C causes DNA damage and RNA splicing defects. J Clin Invest 2014;124:981-99. [PMID: 24509083 DOI: 10.1172/jci72723] [Citation(s) in RCA: 194] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 11/27/2013] [Indexed: 12/20/2022] Open Abstract Autosomal dominant mutations of the RNA/DNA binding protein FUS are linked to familial amyotrophic lateral sclerosis (FALS); however, it is not clear how FUS mutations cause neurodegeneration. Using transgenic mice expressing a common FALS-associated FUS mutation (FUS-R521C mice), we found that mutant FUS proteins formed a stable complex with WT FUS proteins and interfered with the normal interactions between FUS and histone deacetylase 1 (HDAC1). Consequently, FUS-R521C mice exhibited evidence of DNA damage as well as profound dendritic and synaptic phenotypes in brain and spinal cord. To provide insights into these defects, we screened neural genes for nucleotide oxidation and identified brain-derived neurotrophic factor (Bdnf) as a target of FUS-R521C-associated DNA damage and RNA splicing defects in mice. Compared with WT FUS, mutant FUS-R521C proteins formed a more stable complex with Bdnf RNA in electrophoretic mobility shift assays. Stabilization of the FUS/Bdnf RNA complex contributed to Bdnf splicing defects and impaired BDNF signaling through receptor TrkB. Exogenous BDNF only partially restored dendrite phenotype in FUS-R521C neurons, suggesting that BDNF-independent mechanisms may contribute to the defects in these neurons. Indeed, RNA-seq analyses of FUS-R521C spinal cords revealed additional transcription and splicing defects in genes that regulate dendritic growth and synaptic functions. Together, our results provide insight into how gain-of-function FUS mutations affect critical neuronal functions. Collapse Key Words Collapse MESH Headings Amyotrophic Lateral Sclerosis/genetics Animals Brain-Derived Neurotrophic Factor/genetics Brain-Derived Neurotrophic Factor/metabolism Cells, Cultured Cricetinae DNA Damage Female Histone Deacetylase 1/metabolism Humans Male Mice Mice, Inbred C57BL Mice, Transgenic Motor Cortex/metabolism Motor Cortex/pathology Motor Neurons/metabolism Mutation, Missense Protein Binding Protein Transport RNA Splicing RNA, Messenger/genetics RNA, Messenger/metabolism RNA-Binding Protein FUS/genetics RNA-Binding Protein FUS/metabolism Receptor, trkB/metabolism Signal Transduction Spinal Cord/metabolism Spinal Cord/pathology Synapses/metabolism Transcriptome Collapse Grants OD011915 NIH HHS R01 NS039074 NINDS NIH HHS NS078839 NINDS NIH HHS K02 NS046468 NINDS NIH HHS R01 NS083390 NINDS NIH HHS K26 OD010927 NIH HHS R01 HG005717 NHGRI NIH HHS K26 RR026099 NCRR NIH HHS HG005717 NHGRI NIH HHS I01 BX001108 BLRD VA UL1 TR000150 NCATS NIH HHS K26 OD010945 NIH HHS R21 OD011915 NIH HHS R01 NS078839 NINDS NIH HHS K08 NS072233 NINDS NIH HHS I21 BX001625 BLRD VA OD010927 NIH HHS Collapse
35	Activation of innate immunity is required for efficient nuclear reprogramming. Cell 2013;151:547-58. [PMID: 23101625 DOI: 10.1016/j.cell.2012.09.034] [Citation(s) in RCA: 275] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2011] [Revised: 07/05/2012] [Accepted: 09/18/2012] [Indexed: 12/19/2022] Abstract Retroviral overexpression of reprogramming factors (Oct4, Sox2, Klf4, c-Myc) generates induced pluripotent stem cells (iPSCs). However, the integration of foreign DNA could induce genomic dysregulation. Cell-permeant proteins (CPPs) could overcome this limitation. To date, this approach has proved exceedingly inefficient. We discovered a striking difference in the pattern of gene expression induced by viral versus CPP-based delivery of the reprogramming factors, suggesting that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In gain- and loss-of-function studies, we find that the toll-like receptor 3 (TLR3) pathway enables efficient induction of pluripotency by viral or mmRNA approaches. Stimulation of TLR3 causes rapid and global changes in the expression of epigenetic modifiers to enhance chromatin remodeling and nuclear reprogramming. Activation of inflammatory pathways are required for efficient nuclear reprogramming in the induction of pluripotency. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
36	An Oct4-Sall4-Nanog network controls developmental progression in the pre-implantation mouse embryo. Mol Syst Biol 2013;9:632. [PMID: 23295861 PMCID: PMC3564263 DOI: 10.1038/msb.2012.65] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 11/30/2012] [Indexed: 01/18/2023] Open Abstract Landmark events occur in a coordinated manner during pre-implantation development of the mammalian embryo, yet the regulatory network that orchestrates these events remains largely unknown. Here, we present the first systematic investigation of the network in pre-implantation mouse embryos using morpholino-mediated gene knockdowns of key embryonic stem cell (ESC) factors followed by detailed transcriptome analysis of pooled embryos, single embryos, and individual blastomeres. We delineated the regulons of Oct4, Sall4, and Nanog and identified a set of metabolism- and transport-related genes that were controlled by these transcription factors in embryos but not in ESCs. Strikingly, the knockdown embryos arrested at a range of developmental stages. We provided evidence that the DNA methyltransferase Dnmt3b has a role in determining the extent to which a knockdown embryo can develop. We further showed that the feed-forward loop comprising Dnmt3b, the pluripotency factors, and the miR-290-295 cluster exemplifies a network motif that buffers embryos against gene expression noise. Our findings indicate that Oct4, Sall4, and Nanog form a robust and integrated network to govern mammalian pre-implantation development. Collapse Key Words pluripotency factors pre-implantation development transcriptional networks Collapse MESH Headings Animals Blastocyst/metabolism Blastocyst/physiology DNA (Cytosine-5-)-Methyltransferases/genetics DNA (Cytosine-5-)-Methyltransferases/metabolism DNA-Binding Proteins/genetics DNA-Binding Proteins/metabolism Embryo Culture Techniques Embryo, Mammalian/metabolism Embryonic Development Embryonic Stem Cells/physiology Female Gene Expression Profiling Gene Expression Regulation, Developmental Gene Knockdown Techniques Gene Regulatory Networks Homeodomain Proteins/genetics Homeodomain Proteins/metabolism Male Mice Mice, Inbred C57BL Mice, Inbred DBA MicroRNAs/genetics Nanog Homeobox Protein Octamer Transcription Factor-3/genetics Octamer Transcription Factor-3/metabolism Oligonucleotide Array Sequence Analysis Transcription Factors/genetics Transcription Factors/metabolism DNA Methyltransferase 3B Collapse Grants R01 HD057970 NICHD NIH HHS Collapse
37	Improving PacBio long read accuracy by short read alignment. PLoS One 2012;7:e46679. [PMID: 23056399 PMCID: PMC3464235 DOI: 10.1371/journal.pone.0046679] [Citation(s) in RCA: 212] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2012] [Accepted: 09/02/2012] [Indexed: 11/24/2022] Open Abstract The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity. Collapse Key Words Collapse MESH Headings Computational Biology/methods Reproducibility of Results Sequence Analysis, DNA Sequence Analysis, RNA Software Collapse Grants R01 HD057970 NICHD NIH HHS R01 HG005717 NHGRI NIH HHS R01HG005717 NHGRI NIH HHS R01HD057970 NICHD NIH HHS Collapse
38	RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development. Genome Res 2012;23:201-16. [PMID: 22960373 PMCID: PMC3530680 DOI: 10.1101/gr.141424.112] [Citation(s) in RCA: 121] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Abstract The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking. Here, we used paired end RNA sequencing (RNA-seq) to explore the transcriptome of Xenopus tropicalis in 23 distinct developmental stages. We determined expression levels of all genes annotated in RefSeq and Ensembl and showed for the first time on a genome-wide scale that, despite a general state of transcriptional silence in the earliest stages of development, approximately 150 genes are transcribed prior to the midblastula transition. In addition, our splicing analysis uncovered more than 10,000 novel splice junctions at each stage and revealed that many known genes have additional unannotated isoforms. Furthermore, we used Cufflinks to reconstruct transcripts from our RNA-seq data and found that ∼13.5% of the final contigs are derived from novel transcribed regions, both within introns and in intergenic regions. We then developed a filtering pipeline to separate protein-coding transcripts from noncoding RNAs and identified a confident set of 6686 noncoding transcripts in 3859 genomic loci. Since the current reference genome, XenTro3, consists of hundreds of scaffolds instead of full chromosomes, we also performed de novo reconstruction of the transcriptome using Trinity and uncovered hundreds of transcripts that are missing from the genome. Collectively, our data will not only aid in completing the assembly of the Xenopus tropicalis genome but will also serve as a valuable resource for gene discovery and for unraveling the fundamental mechanisms of vertebrate embryogenesis. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
39	Abstract 363: Toll-Like Receptor 3 Activation Promotes Efficient Nuclear Reprogramming and Endothelial Differentiation. Circ Res 2012. [DOI: 10.1161/res.111.suppl_1.a363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Introduction: Stem cell therapy for vascular regeneration has been investigated using embryonic stem cells. We recently generated endothelial cells (ECs) from human induced pluripotent stem cells (hiPSCs) and investigated their potential to promote the perfusion of ischemic tissue in a murine model of peripheral arterial disease (PAD). However, to utilize iPSCs therapeutically, the cells should be generated via non-integrating approaches to avoid integration of foreign DNA into the genome. Objective: The present study highlights underlying mechanisms of reprogramming and investigates the role of novel pathways in enhancing nuclear reprogramming for potential clinical application. Results: Since the initial discovery, different non-integrating approaches have been developed to generate iPSCs. One such approach is to deliver the pluripotent factors (Oct4, Sox2, Klf4 and cMyc) as cell-permeant proteins (CPPs). However, human cells have not been reprogrammed using purified CPPs. In seeking to develop this approach, we discovered a striking difference in the pattern of gene expression induced by viral versus protein-based delivery of the reprogramming factors. This suggested that a signaling pathway required for efficient nuclear reprogramming was activated by the retroviral, but not CPP approach. In both gain- and loss-of function studies, we find that activation of toll-like receptor 3 (TLR3) plays a role in the efficient reprogramming of human cells using viral approaches. Stimulation of TLR3 causes rapid changes in the expression of epigenetic modifiers, with chromatin remodeling and changes in gene expression that favors induction of pluripotency. Importantly, knowing that this pathway is critical, we were able to generate human iPSCs using CPPs by adding a TLR3 agonist (Poly IC) to the reprogramming protocol. Conclusion: Recognition of the role of innate immunity signaling in reprogramming may advance the therapeutic application of iPSCs. We intend to develop an efficient protein-based system to generate EC and determine their therapeutic potential in animal models of PAD. Furthermore, we have discovered an important signaling pathway in reprogramming, which may have implications in cancer biology and regenerative medicine. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
40	Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res 2010;38:4570-8. [PMID: 20371516 PMCID: PMC2919714 DOI: 10.1093/nar/gkq211] [Citation(s) in RCA: 209] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 03/10/2010] [Accepted: 03/12/2010] [Indexed: 11/27/2022] Open Abstract Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50-100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12% higher sensitivity without sacrificing specificity. Collapse Key Words Collapse MESH Headings Algorithms Alternative Splicing Computational Biology/methods Humans Polymerase Chain Reaction RNA Splice Sites Sequence Analysis, RNA Software Collapse Grants 1R01HG004634 NHGRI NIH HHS Collapse
41	TB surveillance in correctional institutions in Hong Kong, 1999-2005. Int J Tuberc Lung Dis 2008;12:93-98. [PMID: 18173884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023] Open Abstract OBJECTIVE To understand the epidemiology of tuberculosis (TB) inside the prison system of Hong Kong. METHOD Prospective territory-wide TB surveillance was conducted among prisoners in 24 correctional institutions. RESULTS From 1999 to 2005, 622 prevalent TB cases diagnosed before or within 3 months of incarceration and 214 incident cases diagnosed after 3 months were reported by prison staff to a paper-based central prison TB registry. Both crude prevalence and incidence were falling (chi(2) for trend, both P < 0.001), despite a higher sex- and age-adjusted prison TB incidence as compared to the general population (indirectly standardised rate [ISR] 280.6 vs. 108.0/100000, P < 0.001). Illegal immigrants (odds ratio [OR] 3.6, 95% confidence interval [CI] 1.8-7.4) and drug addicts (OR 2.04, 95%CI 1.13-3.7) were two major risk groups. The TB incident risk disappeared after their exclusion (ISR 117.1 vs. 108.0/100000, P = 0.52). No significant difference in the multidrug-resistant rate was found when comparing the group with the general population (3.5% vs. 1.0%, OR 3.6, 95%CI 0.5-28.4). No extensively drug-resistant (XDR) cases were identified. CONCLUSION TB remains a significant disease in local prisons. Further strengthening of TB control programmes in prisons, especially targeting the higher risk groups, is recommended. Collapse Key Words Collapse MESH Headings Adult Antitubercular Agents/therapeutic use Communicable Disease Control Drug Resistance, Multiple, Bacterial Female Hong Kong/epidemiology Humans Incidence Male Middle Aged Odds Ratio Population Surveillance Prevalence Prisons/statistics & numerical data Prospective Studies Risk Assessment Risk Factors Substance-Related Disorders/complications Substance-Related Disorders/epidemiology Time Factors Transients and Migrants/statistics & numerical data Tuberculosis/diagnosis Tuberculosis/drug therapy Tuberculosis/epidemiology Tuberculosis/etiology Tuberculosis, Multidrug-Resistant/diagnosis Tuberculosis, Multidrug-Resistant/drug therapy Tuberculosis, Multidrug-Resistant/epidemiology Tuberculosis, Multidrug-Resistant/etiology Collapse Grants Collapse
42	Chest radiograph screening for tuberculosis in a Hong Kong prison. Int J Tuberc Lung Dis 2005;9:627-32. [PMID: 15971389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023] Open Abstract SETTING Long-stay prisoners are not regularly screened for TB in Hong Kong. OBJECTIVE To evaluate tuberculosis (TB) screening in prison. METHOD All prisoners in a maximum security prison as of 31 October 2001 were screened by chest radiograph (CXR), except for those being followed up for TB or examined by CXR in the last 6 months. RESULTS A total of 814 male prisoners aged 34.6 +/- 9.6 (mean +/- SD) years were successfully screened. Of 53 cases (6.51%) with radiographic abnormalities, 10 active TB cases (8 culture-negative, 2 culture-positive) were diagnosed, giving an overall yield of 1.23% (95%CI 0.59-2.26). There was no statistical difference in age, ethnicity, place of birth or residency status between those with and those without TB (all P > 0.05). Incarceration > or = 2 years, being in current prison > or = 2 years and not having CXR in last 2 years were associated with TB in univariate analysis (all P < 0.05), but only the last remained an independent predictor in multiple logistic regression (OR 16.8, 95%CI 2.1-132.9, P = 0.008). In that group, the yield was 3.1% (95%CI 1.42-5.89). No further cases were detected in the subsequent 2 years. CONCLUSION CXR screening of long-stay prisoners gave a high yield in this study. Collapse Key Words Collapse MESH Headings Adult Cohort Studies Cost-Benefit Analysis Hong Kong/epidemiology Humans Male Mass Screening/economics Pilot Projects Prevalence Prisoners Prisons Radiography Risk Factors Tuberculosis, Pulmonary/diagnostic imaging Tuberculosis, Pulmonary/epidemiology Tuberculosis, Pulmonary/prevention & control Collapse Grants Collapse
43	Tuberculin response in BCG vaccinated schoolchildren and the estimation of annual risk of infection in Hong Kong. Thorax 2005;60:124-9. [PMID: 15681500 PMCID: PMC1747293 DOI: 10.1136/thx.2003.017970] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Abstract BACKGROUND In Hong Kong there has been nearly universal neonatal BCG vaccination coverage since 1980. METHOD 21 113 schoolchildren aged 6-9 years were skin tested with one unit of tuberculin (PPD RT-23) using the intradermal technique during a routine BCG revaccination programme. Information on sex, date of birth, date of tuberculin testing, and tuberculin reaction size at 72 hours was retrieved. The annual risk of tuberculous infection (ARTI) was estimated by three different approaches. RESULTS Significantly higher tuberculin positive rates were found in girls and with increasing age at all commonly used cut-off points (5, 10, and 15 mm). Using a cut-off point of > or =10 mm and the formula 1- (1 - tuberculin positive rate)(1/age), the ARTI was estimated to be 1.93% (95% CI 1.84 to 2.03) for girls and 1.41% (95% CI 1.33 to 1.50) for boys. Using the differences in the tuberculin positive rate between the 6-7 year and 8-9 year age groups, the ARTI became 1.90% (95% CI 1.09 to 2.70) and 1.84% (95% CI 1.15 to 2.54) for girls and boys, respectively. When the prevalence of infection was estimated by locating a secondary peak of the tuberculin reaction distribution curve at 15 mm and assuming a symmetrical distribution of reaction sizes among those infected around this peak, the corresponding ARTI was much lower at 0.52% (95% CI 0.46 to 0.59) and 0.43% (95% CI 0.37 to 0.49) for girls and boys, similar to that estimated indirectly from the prevalence of disease. CONCLUSION The ARTI as estimated by conventional methods was unexpectedly high among BCG vaccinated children and did not agree with that anticipated from the annual incidence of active disease. Further studies are needed to address the discrepancies, including the possible interaction between BCG and other environmental stimuli. Collapse Key Words Collapse MESH Headings Age Factors BCG Vaccine Child Cohort Studies Female Hong Kong/epidemiology Humans Incidence Male Prevalence Risk Factors Sex Factors Skin/immunology Time Factors Tuberculin/immunology Tuberculin Test/methods Tuberculosis/epidemiology Tuberculosis/immunology Tuberculosis/prevention & control Collapse Grants Collapse
44	Socio-economic factors and tuberculosis: a district-based ecological analysis in Hong Kong. Int J Tuberc Lung Dis 2004;8:958-64. [PMID: 15305477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023] Open Abstract BACKGROUND Relatively little is known about the impact of socio-economic factors on tuberculosis in a metropolitan city with high disease incidence. METHOD District-specific tuberculosis notification rates for 1995--1997 and 2000--2002 were indirectly sex- and age-adjusted and compared with the socio-economic characteristics in the 1996 by-census and 2001 census. RESULTS The differences between the 18 districts persisted after 3-year averaging and indirect standardisation. Only the percentage of population born locally, the percentage of the population widowed or divorced and the percentage of households residing in rooms or bedsits were consistently associated with the standardised notification ratios (SNR) for both periods, the first being negatively so (all P < 0.05). In a combined analysis with a general linear model for both periods, birth in China, residence <7 years, speaking other Asian languages, being married and in a single household were also significantly associated with the SNR (all P < 0.05). Using a backward conditional approach, only local birth, being married, and residing in rooms or bedsits were independent predictors of SNR (all P < 0.05). There was no significant association between SNR and socio-economic indices on education, occupation, unemployment and income. CONCLUSION Socio-economic factors other than simple poverty are affecting the district-specific tuberculosis rates in Hong Kong. Collapse Key Words Collapse MESH Headings Female Hong Kong/epidemiology Humans Incidence Male Population Surveillance Socioeconomic Factors Tuberculosis/epidemiology Tuberculosis/ethnology Collapse Grants Collapse
45	Levofloxacin in the treatment of drug-resistant tuberculosis. Int J Tuberc Lung Dis 1997;1:89. [PMID: 9441068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open Abstract Collapse Key Words Collapse MESH Headings Aged Anti-Infective Agents/therapeutic use Antitubercular Agents/therapeutic use Drug Therapy, Combination Follow-Up Studies Humans Levofloxacin Male Ofloxacin/therapeutic use Radiography Tuberculosis, Multidrug-Resistant/diagnostic imaging Tuberculosis, Multidrug-Resistant/drug therapy Tuberculosis, Multidrug-Resistant/physiopathology Collapse Grants Collapse