1
|
Zhou J, Zhang B, Li H, Zhou L, Li Z, Long Y, Han W, Wang M, Cui H, Li J, Chen W, Gao X. Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:959-973. [PMID: 36528241 PMCID: PMC10025762 DOI: 10.1016/j.gpb.2022.11.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 10/21/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner, and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset, thus resulting in drastic false positive predictions when applied on the genome scale. Here, we present DeeReCT-TSS, a deep learning-based method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types, which enables the identification of cell type-specific TSSs. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states. The source code for DeeReCT-TSS is available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316.
Collapse
Affiliation(s)
- Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Bin Zhang
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Haoyang Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Longxi Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Yongkang Long
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Mengran Wang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Huanhuan Cui
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China; Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China; Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, China
| | - Jingjing Li
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Wei Chen
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China; Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China; Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, China.
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia; Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia.
| |
Collapse
|
3
|
Simvastatin down-regulates differential genetic profiles produced by organochlorine mixtures in primary breast cell (HMEC). Chem Biol Interact 2017; 268:85-92. [PMID: 28263720 DOI: 10.1016/j.cbi.2017.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Revised: 02/21/2017] [Accepted: 03/01/2017] [Indexed: 11/22/2022]
Abstract
Women all over the world are exposed to an unavoidable contamination by organochlorine pesticides and other chemical pollutants. Many of them are considered as xenoestrogens and have been associated with the development and progression of breast cancer. We have demonstrated that the most prevalent pesticide mixtures found in healthy women and in women diagnosed with breast cancer modulates the gene expression in human epithelial mammary cells. Statins are well-known cholesterol-depleting agents acting as inhibitors of cholesterol synthesis. Since the early 1990s, it has been known that statins could be successfully used in cancer therapy, including breast cancer, but the exact mechanism behind anti-tumor activity of the statins remains unclear. In the present study we evaluated the effect of simvastatin in the gene expression pattern induced by realistic organochlorine mixtures found in breast cancer patients. The gene expression of 94 genes related with the cell signaling pathways were assessed. Our results indicate that simvastatin exerts a global down regulating effect on successfully determined genes (78.7%), thus attenuating the effects induced by organochlorine mixtures on the gene profile of human mammary epithelial cells. This effect was more evident on genes whose function is the ATP-binding process (that also were particularly up-regulated by pesticide mixtures). We also found that MERTK (a proto-oncogene which is overexpressed in several malignancies) and PDGFRB (a member of the platelet-derived growth factor family whose expression is high in breast-cancer cells that have become resistant to endocrine therapy) were among the genes with a higher differential regulation by simvastatin. Since resistance to treatment with tyrosine kinase inhibitors is closely related to MERKT, our findings would enhance the possible utility of statins in breast cancer treatment, i.e. improving therapeutic results combining statins with tyrosine Kinase inhibitors.
Collapse
|