1
|
Zhang W, Cui Y, Liu B, Loza M, Park SJ, Nakai K. HyGAnno: hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data. Brief Bioinform 2024; 25:bbae152. [PMID: 38581422 PMCID: PMC10998639 DOI: 10.1093/bib/bbae152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/19/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.
Collapse
Affiliation(s)
- Weihang Zhang
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Yang Cui
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Bowen Liu
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Martin Loza
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, University of Tokyo, Tokyo, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| |
Collapse
|
2
|
Zhu Z, Zhou Q, Sun Y, Lai F, Wang Z, Hao Z, Li G. MethMarkerDB: a comprehensive cancer DNA methylation biomarker database. Nucleic Acids Res 2024; 52:D1380-D1392. [PMID: 37889076 PMCID: PMC10767949 DOI: 10.1093/nar/gkad923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/21/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
DNA methylation plays a crucial role in tumorigenesis and tumor progression, sparking substantial interest in the clinical applications of cancer DNA methylation biomarkers. Cancer-related whole-genome bisulfite sequencing (WGBS) data offers a promising approach to precisely identify these biomarkers with differentially methylated regions (DMRs). However, currently there is no dedicated resource for cancer DNA methylation biomarkers with WGBS data. Here, we developed a comprehensive cancer DNA methylation biomarker database (MethMarkerDB, https://methmarkerdb.hzau.edu.cn/), which integrated 658 WGBS datasets, incorporating 724 curated DNA methylation biomarker genes from 1425 PubMed published articles. Based on WGBS data, we documented 5.4 million DMRs from 13 common types of cancer as candidate DNA methylation biomarkers. We provided search and annotation functions for these DMRs with different resources, such as enhancers and SNPs, and developed diagnostic and prognostic models for further biomarker evaluation. With the database, we not only identified known DNA methylation biomarkers, but also identified 781 hypermethylated and 5245 hypomethylated pan-cancer DMRs, corresponding to 693 and 2172 genes, respectively. These novel potential pan-cancer DNA methylation biomarkers hold significant clinical translational value. We hope that MethMarkerDB will help identify novel cancer DNA methylation biomarkers and propel the clinical application of these biomarkers.
Collapse
Affiliation(s)
- Zhixian Zhu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Qiangwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuanhui Sun
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Fuming Lai
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhenji Wang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhigang Hao
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
3
|
Qian FC, Zhou LW, Zhu YB, Li YY, Yu ZM, Feng CC, Fang QL, Zhao Y, Cai FH, Wang QY, Tang HF, Li CQ. scATAC-Ref: a reference of scATAC-seq with known cell labels in multiple species. Nucleic Acids Res 2024; 52:D285-D292. [PMID: 37897340 PMCID: PMC10767920 DOI: 10.1093/nar/gkad924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/14/2023] [Accepted: 10/16/2023] [Indexed: 10/30/2023] Open
Abstract
Chromatin accessibility profiles at single cell resolution can reveal cell type-specific regulatory programs, help dissect highly specialized cell functions and trace cell origin and evolution. Accurate cell type assignment is critical for effectively gaining biological and pathological insights, but is difficult in scATAC-seq. Hence, by extensively reviewing the literature, we designed scATAC-Ref (https://bio.liclab.net/scATAC-Ref/), a manually curated scATAC-seq database aimed at providing a comprehensive, high-quality source of chromatin accessibility profiles with known cell labels across broad cell types. Currently, scATAC-Ref comprises 1 694 372 cells with known cell labels, across various biological conditions, >400 cell/tissue types and five species. We used uniform system environment and software parameters to perform comprehensive downstream analysis on these chromatin accessibility profiles with known labels, including gene activity score, TF enrichment score, differential chromatin accessibility regions, pathway/GO term enrichment analysis and co-accessibility interactions. The scATAC-Ref also provided a user-friendly interface to query, browse and visualize cell types of interest, thereby providing a valuable resource for exploring epigenetic regulation in different tissues and cell types.
Collapse
Affiliation(s)
- Feng-Cui Qian
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Li-Wei Zhou
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Yan-Bing Zhu
- Beijing Clinical Research Institute, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Yan-Yu Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, 163319, China
| | - Zheng-Min Yu
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Chen-Chen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, 163319, China
| | - Qiao-Li Fang
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Yu Zhao
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Fu-Hong Cai
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
| | - Qiu-Yu Wang
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| | - Hui-Fang Tang
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| | - Chun-Quan Li
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Hunan Provincial Maternal and Child Health Care Hospital, National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
- School of Computer, University of South China, Hengyang, Hunan, 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China
| |
Collapse
|
4
|
Song C, Zhang Y, Huang H, Wang Y, Zhao X, Zhang G, Yin M, Feng C, Wang Q, Qian F, Shang D, Zhang J, Liu J, Li C, Tang H. Cis-Cardio: A comprehensive analysis platform for cardiovascular-relavant cis-regulation in human and mouse. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 33:655-667. [PMID: 37637211 PMCID: PMC10458290 DOI: 10.1016/j.omtn.2023.07.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 07/25/2023] [Indexed: 08/29/2023]
Abstract
Cis-regulatory elements are important molecular switches in controlling gene expression and are regarded as determinant hubs in the transcriptional regulatory network. Collection and processing of large-scale cis-regulatory data are urgent to decipher the potential mechanisms of cardiovascular diseases from a cis-regulatory element aspect. Here, we developed a novel web server, Cis-Cardio, which aims to document a large number of available cardiovascular-related cis-regulatory data and to provide analysis for unveiling the comprehensive mechanisms at a cis-regulation level. The current version of Cis-Cardio catalogs a total of 45,382,361 genomic regions from 1,013 human and mouse epigenetic datasets, including ATAC-seq, DNase-seq, Histone ChIP-seq, TF/TcoF ChIP-seq, RNA polymerase ChIP-seq, and Cohesin ChIP-seq. Importantly, Cis-Cardio provides six analysis tools, including region overlap analysis, element upstream/downstream analysis, transcription regulator enrichment analysis, variant interpretation, and protein-protein interaction-based co-regulatory analysis. Additionally, Cis-Cardio provides detailed and abundant (epi-) genetic annotations in cis-regulatory regions, such as super-enhancers, enhancers, transcription factor binding sites (TFBSs), methylation sites, common SNPs, risk SNPs, expression quantitative trait loci (eQTLs), motifs, DNase I hypersensitive sites (DHSs), and 3D chromatin interactions. In summary, Cis-Cardio is a valuable resource for elucidating and analyzing regulatory cues of cardiovascular-specific cis-regulatory elements. The platform is freely available at http://www.licpathway.net/Cis-Cardio/index.html.
Collapse
Affiliation(s)
- Chao Song
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| | - Yuexin Zhang
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| | - Hong Huang
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
- Clinical Research Center for Myocardial Injury in Hunan Province, Hengyang, Hunan 421001, China
| | - Yuezhu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Xilong Zhao
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Guorui Zhang
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
| | - Mingxue Yin
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
| | - Chenchen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Qiuyu Wang
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Fengcui Qian
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| | - Desi Shang
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Jiaqi Liu
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| | - Chunquan Li
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, Hunan 410008, China
- Key Laboratory of Rare Pediatric Diseases, Ministry of Education, University of South China, Hengyang, Hunan 421001, China
| | - Huifang Tang
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics and Artificial Intelligence of Cardiovascular Diseases, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
- Clinical Research Center for Myocardial Injury in Hunan Province, Hengyang, Hunan 421001, China
| |
Collapse
|
5
|
Armendariz DA, Sundarrajan A, Hon GC. Breaking enhancers to gain insights into developmental defects. eLife 2023; 12:e88187. [PMID: 37497775 PMCID: PMC10374278 DOI: 10.7554/elife.88187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/19/2023] [Indexed: 07/28/2023] Open
Abstract
Despite ground-breaking genetic studies that have identified thousands of risk variants for developmental diseases, how these variants lead to molecular and cellular phenotypes remains a gap in knowledge. Many of these variants are non-coding and occur at enhancers, which orchestrate key regulatory programs during development. The prevailing paradigm is that non-coding variants alter the activity of enhancers, impacting gene expression programs, and ultimately contributing to disease risk. A key obstacle to progress is the systematic functional characterization of non-coding variants at scale, especially since enhancer activity is highly specific to cell type and developmental stage. Here, we review the foundational studies of enhancers in developmental disease and current genomic approaches to functionally characterize developmental enhancers and their variants at scale. In the coming decade, we anticipate systematic enhancer perturbation studies to link non-coding variants to molecular mechanisms, changes in cell state, and disease phenotypes.
Collapse
Affiliation(s)
- Daniel A Armendariz
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Anjana Sundarrajan
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Gary C Hon
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
- Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, United States
- Lyda Hill Department of Bioinformatics, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, United States
| |
Collapse
|
6
|
Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023; 23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]
Abstract
Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.
Collapse
Affiliation(s)
- Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| |
Collapse
|
7
|
Lindhorst D, Halfon MS. Reporter gene assays and chromatin-level assays define substantially non-overlapping sets of enhancer sequences. BMC Genomics 2023; 24:17. [PMID: 36639739 PMCID: PMC9837977 DOI: 10.1186/s12864-023-09123-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Transcriptional enhancers are essential for gene regulation, but how these regulatory elements are best defined remains a significant unresolved question. Traditional definitions rely on activity-based criteria such as reporter gene assays, while more recently, biochemical assays based on chromatin-level phenomena such as chromatin accessibility, histone modifications, and localized RNA transcription have gained prominence. RESULTS We examine here whether these two types of definitions, activity-based and chromatin-based, effectively identify the same sets of sequences. We find that, concerningly, the overlap between the two groups is strikingly limited. Few of the data sets we compared displayed statistically significant overlap, and even for those, the degree of overlap was typically small (below 40% of sequences). Moreover, a substantial batch effect was observed in which experiment set rather than experimental method was a primary driver of whether or not chromatin-defined enhancers showed a strong overlap with reporter gene-defined enhancers. CONCLUSIONS Our results raise important questions as to the appropriateness of both old and new enhancer definitions, and suggest that new approaches are required to reconcile the poor agreement among existing methods for defining enhancers.
Collapse
Affiliation(s)
- Daniel Lindhorst
- grid.273335.30000 0004 1936 9887Department of Biochemistry, University at Buffalo-State University of New York, 955 Main St. #5128, Buffalo, NY 14203 USA ,grid.21729.3f0000000419368729Present Address: Program in Biomedical Sciences, Columbia University, New York, NY 10032 USA
| | - Marc S. Halfon
- grid.273335.30000 0004 1936 9887Department of Biochemistry, University at Buffalo-State University of New York, 955 Main St. #5128, Buffalo, NY 14203 USA ,grid.273335.30000 0004 1936 9887Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203 USA ,grid.273335.30000 0004 1936 9887Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14260 USA ,NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203 USA ,grid.240614.50000 0001 2181 8635Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263 USA
| |
Collapse
|
8
|
Zhou Q, Cheng S, Zheng S, Wang Z, Guan P, Zhu Z, Huang X, Zhou C, Li G. ChromLoops: a comprehensive database for specific protein-mediated chromatin loops in diverse organisms. Nucleic Acids Res 2023; 51:D57-D69. [PMID: 36243984 PMCID: PMC9825580 DOI: 10.1093/nar/gkac893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/14/2022] [Accepted: 10/03/2022] [Indexed: 01/29/2023] Open
Abstract
Chromatin loops (or chromatin interactions) are important elements of chromatin structures. Disruption of chromatin loops is associated with many diseases, such as cancer and polydactyly. A few methods, including ChIA-PET, HiChIP and PLAC-Seq, have been proposed to detect high-resolution, specific protein-mediated chromatin loops. With rapid progress in 3D genomic research, ChIA-PET, HiChIP and PLAC-Seq datasets continue to accumulate, and effective collection and processing for these datasets are urgently needed. Here, we developed a comprehensive, multispecies and specific protein-mediated chromatin loop database (ChromLoops, https://3dgenomics.hzau.edu.cn/chromloops), which integrated 1030 ChIA-PET, HiChIP and PLAC-Seq datasets from 13 species, and documented 1 491 416 813 high-quality chromatin loops. We annotated genes and regions overlapping with chromatin loop anchors with rich functional annotations, such as regulatory elements (enhancers, super-enhancers and silencers), variations (common SNPs, somatic SNPs and eQTLs), and transcription factor binding sites. Moreover, we identified genes with high-frequency chromatin interactions in the collected species. In particular, we identified genes with high-frequency interactions in cancer samples. We hope that ChromLoops will provide a new platform for studying chromatin interaction regulation in relation to biological processes and disease.
Collapse
Affiliation(s)
- Qiangwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Sheng Cheng
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Shanshan Zheng
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhenji Wang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Pengpeng Guan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhixian Zhu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xingyu Huang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Cong Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
9
|
Basran PS, Appleby RB. The unmet potential of artificial intelligence in veterinary medicine. Am J Vet Res 2022; 83:385-392. [PMID: 35353711 DOI: 10.2460/ajvr.22.03.0038] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Veterinary medicine is a broad and growing discipline that includes topics such as companion animal health, population medicine and zoonotic diseases, and agriculture. In this article, we provide insight on how artificial intelligence works and how it is currently applied in veterinary medicine. We also discuss its potential in veterinary medicine. Given the rapid pace of research and commercial product developments in this area, the next several years will pose challenges to understanding, interpreting, and adopting this powerful and evolving technology. Artificial intelligence has the potential to enable veterinarians to perform tasks more efficiently while providing new insights for the management and treatment of disorders. It is our hope that this will translate to better quality of life for animals and those who care for them.
Collapse
Affiliation(s)
- Parminder S Basran
- Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY
| | - Ryan B Appleby
- Department of Clinical Studies, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| |
Collapse
|
10
|
Chen X, Chen S, Song S, Gao Z, Hou L, Zhang X, Lv H, Jiang R. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-021-00432-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
11
|
Rigden DJ, Fernández XM. The 2022 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2022; 50:D1-D10. [PMID: 34986604 PMCID: PMC8728296 DOI: 10.1093/nar/gkab1195] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The 2022 Nucleic Acids Research Database Issue contains 185 papers, including 87 papers reporting on new databases and 85 updates from resources previously published in the Issue. Thirteen additional manuscripts provide updates on databases most recently published elsewhere. Seven new databases focus specifically on COVID-19 and SARS-CoV-2, including SCoV2-MD, the first of the Issue's Breakthrough Articles. Major nucleic acid databases reporting updates include MODOMICS, JASPAR and miRTarBase. The AlphaFold Protein Structure Database, described in the second Breakthrough Article, is the stand-out in the protein section, where the Human Proteoform Atlas and GproteinDb are other notable new arrivals. Updates from DisProt, FuzDB and ELM comprehensively cover disordered proteins. Under the metabolism and signalling section Reactome, ConsensusPathDB, HMDB and CAZy are major returning resources. In microbial and viral genomes taxonomy and systematics are well covered by LPSN, TYGS and GTDB. Genomics resources include Ensembl, Ensembl Genomes and UCSC Genome Browser. Major returning pharmacology resource names include the IUPHAR/BPS guide and the Therapeutic Target Database. New plant databases include PlantGSAD for gene lists and qPTMplants for post-translational modifications. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Our latest update to the NAR online Molecular Biology Database Collection brings the total number of entries to 1645. Following last year's major cleanup, we have updated 317 entries, listing 89 new resources and trimming 80 discontinued URLs. The current release is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|