1
|
Feng H, Li F, Wang T, Xing XH, Zeng AP, Zhang C. Deep-learning-assisted Sort-Seq enables high-throughput profiling of gene expression characteristics with high precision. SCIENCE ADVANCES 2023; 9:eadg5296. [PMID: 37939173 PMCID: PMC10631719 DOI: 10.1126/sciadv.adg5296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 10/06/2023] [Indexed: 11/10/2023]
Abstract
Owing to the nondeterministic and nonlinear nature of gene expression, the steady-state intracellular protein abundance of a clonal population forms a distribution. The characteristics of this distribution, including expression strength and noise, are closely related to cellular behavior. However, quantitative description of these characteristics has so far relied on arrayed methods, which are time-consuming and labor-intensive. To address this issue, we propose a deep-learning-assisted Sort-Seq approach (dSort-Seq) in this work, enabling high-throughput profiling of expression properties with high precision. We demonstrated the validity of dSort-Seq for large-scale assaying of the dose-response relationships of biosensors. In addition, we comprehensively investigated the contribution of transcription and translation to noise production in Escherichia coli, from which we found that the expression noise is strongly coupled with the mean expression level. We also found that the transcriptional interference caused by overlapping RpoD-binding sites contributes to noise production, which suggested the existence of a simple and feasible noise control strategy in E. coli.
Collapse
Affiliation(s)
- Huibao Feng
- MOE Key Laboratory for Industrial Biocatalysis, Institute of Biochemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
| | - Fan Li
- MOE Key Laboratory for Industrial Biocatalysis, Institute of Biochemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
| | - Tianmin Wang
- Tsinghua-Peking Center for Life Sciences, School of Medicine, Tsinghua University, Beijing 100084, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Xin-hui Xing
- MOE Key Laboratory for Industrial Biocatalysis, Institute of Biochemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
- Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
| | - An-ping Zeng
- Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, Hamburg 21073, Germany
- Center of Synthetic Biology and Integrated Bioengineering, School of Engineering, Westlake University, Hangzhou 310024, China
| | - Chong Zhang
- MOE Key Laboratory for Industrial Biocatalysis, Institute of Biochemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
2
|
Manz T, L’Yi S, Gehlenborg N. Gos: a declarative library for interactive genomics visualization in Python. Bioinformatics 2023; 39:6998203. [PMID: 36688709 PMCID: PMC9891240 DOI: 10.1093/bioinformatics/btad050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 11/16/2022] [Accepted: 01/20/2023] [Indexed: 01/24/2023] Open
Abstract
SUMMARY Gos is a declarative Python library designed to create interactive multiscale visualizations of genomics and epigenomics data. It provides a consistent and simple interface to the flexible Gosling visualization grammar. Gos hides technical complexities involved with configuring web-based genome browsers and integrates seamlessly within computational notebooks environments to enable new interactive analysis workflows. AVAILABILITY AND IMPLEMENTATION Gos is released under the MIT License and available on the Python Package Index (PyPI). The source code is publicly available on GitHub (https://github.com/gosling-lang/gos), and documentation with examples can be found at https://gosling-lang.github.io/gos.
Collapse
Affiliation(s)
- Trevor Manz
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Sehi L’Yi
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
3
|
Poonia S, Goel A, Chawla S, Bhattacharya N, Rai P, Lee YF, Yap YS, West J, Bhagat AA, Tayal J, Mehta A, Ahuja G, Majumdar A, Ramalingam N, Sengupta D. Marker-free characterization of full-length transcriptomes of single live circulating tumor cells. Genome Res 2023; 33:80-95. [PMID: 36414416 PMCID: PMC9977151 DOI: 10.1101/gr.276600.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 11/10/2022] [Indexed: 11/23/2022]
Abstract
The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor in the enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered phenotypes relative to the primary tumor. Intensive research both at the technical and molecular fronts led to the development of assays that ease CTC detection and identification from peripheral blood. Most CTC detection methods based on single-cell RNA sequencing (scRNA-seq) use a mix of size selection, marker-based white blood cell (WBC) depletion, and antibodies targeting tumor-associated antigens. However, the majority of these methods either miss out on atypical CTCs or suffer from WBC contamination. We present unCTC, an R package for unbiased identification and characterization of CTCs from single-cell transcriptomic data. unCTC features many standard and novel computational and statistical modules for various analyses. These include a novel method of scRNA-seq clustering, named deep dictionary learning using k-means clustering cost (DDLK), expression-based copy number variation (CNV) inference, and combinatorial, marker-based verification of the malignant phenotypes. DDLK enables robust segregation of CTCs and WBCs in the pathway space, as opposed to the gene expression space. We validated the utility of unCTC on scRNA-seq profiles of breast CTCs from six patients, captured and profiled using an integrated ClearCell FX and Polaris workflow that works by the principles of size-based separation of CTCs and marker-based WBC depletion.
Collapse
Affiliation(s)
- Sarita Poonia
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India
| | - Anurag Goel
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India;,Department of Computer Science and Engineering, Delhi Technological University, New Delhi 110042, India
| | - Smriti Chawla
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India
| | - Namrata Bhattacharya
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India
| | - Priyadarshini Rai
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India
| | - Yi Fang Lee
- Biolidics Limited, Singapore 118257, Singapore
| | - Yoon Sim Yap
- National Cancer Centre Singapore, Singapore 169610, Singapore
| | - Jay West
- Fluidigm Corporation, South San Francisco, California 94080, USA
| | | | - Juhi Tayal
- Department of Research, Rajiv Gandhi Cancer Institute and Research Centre-Delhi (RGCIRC-Delhi), New Delhi 110085, India
| | - Anurag Mehta
- Department of Laboratory Services and Molecular Diagnostics, Rajiv Gandhi Cancer Institute and Research Centre-Delhi (RGCIRC-Delhi), New Delhi 110085, India
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India
| | - Angshul Majumdar
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India;,Centre for Artificial Intelligence, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India;,Department of Electronics & Communications Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India
| | | | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India;,Department of Computer Science and Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India;,Centre for Artificial Intelligence, Indraprastha Institute of Information Technology-Delhi (IIIT-Delhi), New Delhi 110020, India
| |
Collapse
|
4
|
De Jesus Martinez T, Hershberg EA, Guo E, Stevens GJ, Diesh C, Xie P, Bridge C, Cain S, Haw R, Buels RM, Stein LD, Holmes IH. JBrowse Jupyter: a Python interface to JBrowse 2. Bioinformatics 2023; 39:btad032. [PMID: 36648320 PMCID: PMC9887080 DOI: 10.1093/bioinformatics/btad032] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 12/10/2022] [Accepted: 01/16/2023] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION JBrowse Jupyter is a package that aims to close the gap between Python programming and genomic visualization. Web-based genome browsers are routinely used for publishing and inspecting genome annotations. Historically they have been deployed at the end of bioinformatics pipelines, typically decoupled from the analysis itself. However, emerging technologies such as Jupyter notebooks enable a more rapid iterative cycle of development, analysis and visualization. RESULTS We have developed a package that provides a Python interface to JBrowse 2's suite of embeddable components, including the primary Linear Genome View. The package enables users to quickly set up, launch and customize JBrowse views from Jupyter notebooks. In addition, users can share their data via Google's Colab notebooks, providing reproducible interactive views. AVAILABILITY AND IMPLEMENTATION JBrowse Jupyter is released under the Apache License and is available for download on PyPI. Source code and demos are available on GitHub at https://github.com/GMOD/jbrowse-jupyter.
Collapse
Affiliation(s)
| | - Elliot A Hershberg
- Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720, USA
| | - Emma Guo
- Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720, USA
| | - Garrett J Stevens
- Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720, USA
| | - Colin Diesh
- Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720, USA
| | - Peter Xie
- Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720, USA
| | - Caroline Bridge
- Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3, Canada
| | - Scott Cain
- Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3, Canada
| | - Robin Haw
- Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3, Canada
| | - Robert M Buels
- Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720, USA
| | - Lincoln D Stein
- Adaptive Oncology, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, ON M5G 0A3, Canada
| | - Ian H Holmes
- Department of Bioengineering, Stanley Hall, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
5
|
SeMPI 2.0-A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases. Metabolites 2020; 11:metabo11010013. [PMID: 33383692 PMCID: PMC7823522 DOI: 10.3390/metabo11010013] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 12/22/2020] [Accepted: 12/23/2020] [Indexed: 01/10/2023] Open
Abstract
Microorganisms produce secondary metabolites with a remarkable range of bioactive properties. The constantly increasing amount of published genomic data provides the opportunity for efficient identification of biosynthetic gene clusters by genome mining. On the other hand, for many natural products with resolved structures, the encoding biosynthetic gene clusters have not been identified yet. Of those secondary metabolites, the scaffolds of nonribosomal peptides and polyketides (type I modular) can be predicted due to their building block-like assembly. SeMPI v2 provides a comprehensive prediction pipeline, which includes the screening of the scaffold in publicly available natural compound databases. The screening algorithm was designed to detect homologous structures even for partial, incomplete clusters. The pipeline allows linking of gene clusters to known natural products and therefore also provides a metric to estimate the novelty of the cluster if a matching scaffold cannot be found. Whereas currently available tools attempt to provide comprehensive information about a wide range of gene clusters, SeMPI v2 aims to focus on precise predictions. Therefore, the cluster detection algorithm, including building block generation and domain substrate prediction, was thoroughly refined and benchmarked, to provide high-quality scaffold predictions. In a benchmark based on 559 gene clusters, SeMPI v2 achieved comparable or better results than antiSMASH v5. Additionally, the SeMPI v2 web server provides features that can help to further investigate a submitted gene cluster, such as the incorporation of a genome browser, and the possibility to modify a predicted scaffold in a workbench before the database screening.
Collapse
|
6
|
Cao X, Yan Z, Wu Q, Zheng A, Zhong S. GIVE: portable genome browsers for personal websites. Genome Biol 2018; 19:92. [PMID: 30016975 PMCID: PMC6050681 DOI: 10.1186/s13059-018-1465-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 06/19/2018] [Indexed: 12/31/2022] Open
Abstract
Growing popularity and diversity of genomic data demand portable and versatile genome browsers. Here, we present an open source programming library called GIVE that facilitates the creation of personalized genome browsers without requiring a system administrator. By inserting HTML tags, one can add to a personal webpage interactive visualization of multiple types of genomics data, including genome annotation, "linear" quantitative data, and genome interaction data. GIVE includes a graphical interface called HUG (HTML Universal Generator) that automatically generates HTML code for displaying user chosen data, which can be copy-pasted into user's personal website or saved and shared with collaborators. GIVE is available at: https://www.givengine.org/ .
Collapse
Affiliation(s)
- Xiaoyi Cao
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Zhangming Yan
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Qiuyang Wu
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Alvin Zheng
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Sheng Zhong
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|