1
|
Liu X, Li M, Chen T, Zhang R, Wang Y, Xiao J, Ding X, Zhang S, Li Q. A global survey of bicarbonate stress-induced pre-mRNA alternative splicing in soybean via integrative analysis of Iso-seq and RNA-seq. Int J Biol Macromol 2024; 278:135067. [PMID: 39191343 DOI: 10.1016/j.ijbiomac.2024.135067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 08/23/2024] [Accepted: 08/23/2024] [Indexed: 08/29/2024]
Abstract
Alternative splicing (AS) plays important roles in modulating environmental stress responses in plants. However, little is known about the functions of bicarbonate-induced AS in cultivated soybean (Glycine max L. Merr.). In this study, we combined PacBio isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) to elucidate the bicarbonate-induced AS events in soybean root and leaf tissues. Compared to RNA-seq, Iso-seq identified more novel genes and transcripts, as well as more AS events, indicating that Iso-seq is more efficient in AS detection. Combining these two technologies, we found that intron retention (IR) is the most frequent AS event type. We identified a total of 913 and 1974 bicarbonate stress-responsive differentially alternative spliced genes (DAGs) in soybean leaves and roots respectively, from our RNA-seq results. Additionally, we determined a transcription factor (GmNTL9) and a splicing factor (GmRSZ22), and validated their roles in bicarbonate stress response by AS. Overall, our study opens an avenue for evaluating plant AS regulatory networks, and the obtained global landscape of alternative splicing provides valuable insights into the AS-mediated bicarbonate-responsive mechanisms in plant species.
Collapse
Affiliation(s)
- Xin Liu
- Key Laboratory of Agricultural Biological Functional Genes, Northeast Agricultural University, Harbin 150030, China; Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, Harbin 150030, China
| | - Minglong Li
- Key Laboratory of Agricultural Biological Functional Genes, Northeast Agricultural University, Harbin 150030, China; Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, Harbin 150030, China
| | - Tong Chen
- Key Laboratory of Agricultural Biological Functional Genes, Northeast Agricultural University, Harbin 150030, China
| | - Rui Zhang
- Key Laboratory of Agricultural Biological Functional Genes, Northeast Agricultural University, Harbin 150030, China
| | - Yuye Wang
- Key Laboratory of Agricultural Biological Functional Genes, Northeast Agricultural University, Harbin 150030, China
| | - Jialei Xiao
- Key Laboratory of Agricultural Biological Functional Genes, Northeast Agricultural University, Harbin 150030, China; Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, Harbin 150030, China
| | - Xiaodong Ding
- Key Laboratory of Agricultural Biological Functional Genes, Northeast Agricultural University, Harbin 150030, China; Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, Harbin 150030, China.
| | - Shuzhen Zhang
- Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, Harbin 150030, China; Department of Plant Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China.
| | - Qiang Li
- Key Laboratory of Agricultural Biological Functional Genes, Northeast Agricultural University, Harbin 150030, China; Key Laboratory of Soybean Biology of Chinese Education Ministry, Northeast Agricultural University, Harbin 150030, China.
| |
Collapse
|
2
|
Abulfaraj AA, Alshareef SA. Concordant Gene Expression and Alternative Splicing Regulation under Abiotic Stresses in Arabidopsis. Genes (Basel) 2024; 15:675. [PMID: 38927612 PMCID: PMC11202685 DOI: 10.3390/genes15060675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/19/2024] [Accepted: 05/20/2024] [Indexed: 06/28/2024] Open
Abstract
The current investigation endeavors to identify differentially expressed alternatively spliced (DAS) genes that exhibit concordant expression with splicing factors (SFs) under diverse multifactorial abiotic stress combinations in Arabidopsis seedlings. SFs serve as the post-transcriptional mechanism governing the spatiotemporal dynamics of gene expression. The different stresses encompass variations in salt concentration, heat, intensive light, and their combinations. Clusters demonstrating consistent expression profiles were surveyed to pinpoint DAS/SF gene pairs exhibiting concordant expression. Through rigorous selection criteria, which incorporate alignment with documented gene functionalities and expression patterns observed in this study, four members of the serine/arginine-rich (SR) gene family were delineated as SFs concordantly expressed with six DAS genes. These regulated SF genes encompass cactin, SR1-like, SR30, and SC35-like. The identified concordantly expressed DAS genes encode diverse proteins such as the 26.5 kDa heat shock protein, chaperone protein DnaJ, potassium channel GORK, calcium-binding EF hand family protein, DEAD-box RNA helicase, and 1-aminocyclopropane-1-carboxylate synthase 6. Among the concordantly expressed DAS/SF gene pairs, SR30/DEAD-box RNA helicase, and SC35-like/1-aminocyclopropane-1-carboxylate synthase 6 emerge as promising candidates, necessitating further examinations to ascertain whether these SFs orchestrate splicing of the respective DAS genes. This study contributes to a deeper comprehension of the varied responses of the splicing machinery to abiotic stresses. Leveraging these DAS/SF associations shows promise for elucidating avenues for augmenting breeding programs aimed at fortifying cultivated plants against heat and intensive light stresses.
Collapse
Affiliation(s)
- Aala A. Abulfaraj
- Biological Sciences Department, College of Science & Arts, King Abdulaziz University, Rabigh 21911, Saudi Arabia
| | - Sahar A. Alshareef
- Department of Biology, College of Science and Arts at Khulis, University of Jeddah, Jeddah 21921, Saudi Arabia;
| |
Collapse
|
3
|
Lin CY, Zhang YM, Li BZ, Shu MA, Xu WB. Identification and characterization of mitogen-activated protein kinase kinase 4 (MKK4) from the mud crab Scylla paramamosain in response to Vibrio alginolyticus and White Spot Syndrome Virus (WSSV). DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2023; 147:104755. [PMID: 37295629 DOI: 10.1016/j.dci.2023.104755] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/03/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023]
Abstract
Mitogen-activated protein kinase kinase 4 (MKK4), serves as a critical component of the mitogen-activated protein kinase signaling pathway, facilitating the direct phosphorylation and activation of the c-Jun N-terminal kinase (JNK) and p38 families of MAP kinases in response to environmental stresses. In the current research, we identified two MKK4 subtypes, namely SpMKK4-1 and SpMKK4-2, from Scylla paramamosain, followed by the analysis of their molecular characteristics and tissue distributions. The expression of SpMKK4s was induced upon WSSV and Vibrio alginolyticus challenges, and the bacteria clearance capacity and antimicrobial peptide (AMP) genes' expression upon bacterial infection were significantly decreased after knocking down SpMKK4s. Additionally, the overexpression of both SpMKK4s remarkably activated NF-κB reporter plasmid in HEK293T cells, suggesting the activation of the NF-κB signaling pathway. These results indicated the participation of SpMKK4s in the innate immunity of crabs, which shed light on a better understanding of the mechanisms through which MKK4s regulate innate immunity.
Collapse
Affiliation(s)
- Chen-Yang Lin
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yan-Mei Zhang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Bang-Ze Li
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Miao-An Shu
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China.
| | - Wen-Bin Xu
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
4
|
Mann JT, Riley BA, Baker SF. All differential on the splicing front: Host alternative splicing alters the landscape of virus-host conflict. Semin Cell Dev Biol 2023; 146:40-56. [PMID: 36737258 DOI: 10.1016/j.semcdb.2023.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 02/05/2023]
Abstract
Alternative RNA splicing is a co-transcriptional process that richly increases proteome diversity, and is dynamically regulated based on cell species, lineage, and activation state. Virus infection in vertebrate hosts results in rapid host transcriptome-wide changes, and regulation of alternative splicing can direct a combinatorial effect on the host transcriptome. There has been a recent increase in genome-wide studies evaluating host alternative splicing during viral infection, which integrates well with prior knowledge on viral interactions with host splicing proteins. A critical challenge remains in linking how these individual events direct global changes, and whether alternative splicing is an overall favorable pathway for fending off or supporting viral infection. Here, we introduce the process of alternative splicing, discuss how to analyze splice regulation, and detail studies on genome-wide and splice factor changes during viral infection. We seek to highlight where the field can focus on moving forward, and how incorporation of a virus-host co-evolutionary perspective can benefit this burgeoning subject.
Collapse
Affiliation(s)
- Joshua T Mann
- Infectious Disease Program, Lovelace Biomedical Research Institute, Albuquerque, NM, USA
| | - Brent A Riley
- Infectious Disease Program, Lovelace Biomedical Research Institute, Albuquerque, NM, USA
| | - Steven F Baker
- Infectious Disease Program, Lovelace Biomedical Research Institute, Albuquerque, NM, USA.
| |
Collapse
|
5
|
Jorgensen K, Garcia OA, Kiyamu M, Brutsaert TD, Bigham AW. Genetic adaptations to potato starch digestion in the Peruvian Andes. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2023; 180:162-172. [PMID: 39882941 DOI: 10.1002/ajpa.24656] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/18/2022] [Accepted: 10/25/2022] [Indexed: 01/31/2025]
Abstract
OBJECTIVES Potatoes are an important staple crop across the world and particularly in the Andes, where they were cultivated as early as 10,000 years ago. Ancient Andean populations that relied upon this high-starch food to survive could possess genetic adaptation(s) to digest potato starch more efficiently. Here, we analyzed genomic data to identify whether this putative adaptation is still present in their modern-day descendants, namely Peruvians of Indigenous American ancestry. MATERIALS AND METHODS We applied several tests to detect signatures of natural selection in genes associated with starch-digestion, AMY1, AMY2, SI, and MGAM in Peruvians. These were compared to two populations who only recently incorporated potatoes into their diets, Han Chinese and West Africans. RESULTS Overlapping statistical results identified a regional haplotype in MGAM that is unique to Peruvians. The age of this haplotype was estimated to be around 9547 years old. DISCUSSION The MGAM haplotype in Peruvians lies within a region of high transcriptional activity associated with the REST protein. The timing of this haplotype suggests that it arose in response to increased potato cultivation and attendant consumption. For Peruvian populations that relied upon the high-starch potato as a major source of nutrition, natural selection likely favored these MGAM variant(s) that led to more efficient digestion and increased glucose production. This research provides further support that detecting subtle shifts in human diet can be a major driver of human evolutionary change, as these results indicate that there is global variation in human ability to better digest high-starch foods.
Collapse
Affiliation(s)
- Kelsey Jorgensen
- Department of Anthropology, University of California, Los Angeles, California, USA
- Department of Anthropology, Wayne State University, Detroit, Michigan, USA
| | - Obed A Garcia
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Melisa Kiyamu
- Departamento de Ciencias Biológicas y Fisiológicas, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Tom D Brutsaert
- Department of Exercise Science, Syracuse University, Syracuse, New York, USA
| | - Abigail W Bigham
- Department of Anthropology, University of California, Los Angeles, California, USA
| |
Collapse
|
6
|
Liu Y, Li HD, Xu Y, Liu YW, Peng X, Wang J. IsoCell: An Approach to Enhance Single Cell Clustering by Integrating Isoform-Level Expression Through Orthogonal Projection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:465-475. [PMID: 35100120 DOI: 10.1109/tcbb.2022.3147193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Single cell RNA sequencing (scRNA-seq) provides a powerful approach for profiling transcriptomes at single cell resolution. An essential application of scRNA-seq is the discovery of cell types with the aid of clustering analysis. Currently, existing single cell clustering methods are exclusively based on gene-level expression data, without considering alternative splicing information. It has been shown that alternative splicing has an important influence on biological processes such as cell differentiation and cell cycle. We therefore hypothesize that adding information about alternative splicing may help enhance single cell clustering. This motivates us to develop a way to integrate isoform-level expression and gene-level expression. We report an approach to enhance single cell clustering by integrating isoform-level expression through orthogonal projection. First, we construct an orthogonal projection matrix based on gene expression data. Second, isoforms are projected to the gene space to remove the redundant information between them. Third, isoform selection is performed based on the residual of the projected expression and the selected isoforms are combined with gene expression data for subsequent clustering. We applied our method to sixteen scRNA-seq datasets. We find that alternative splicing contains differential information among cell types and can be integrated to enhance single cell clustering. Compared with using only gene-level expression data, the integration of isoform-level expression leads to better clustering performances for most of the datasets. The integration of isoform-level expression also has potential in the detection of novel cell subgroups. Our study shows that integrating isoform and gene-level expression is a promising way to improve single cell clustering. The IsoCell R package is freely available at both Github (https://github.com/genemine/IsoCell) and Zenodo (https://zenodo.org/record/4395707).
Collapse
|
7
|
Raina P, Guinea R, Chatsirisupachai K, Lopes I, Farooq Z, Guinea C, Solyom CA, de Magalhães JP. GeneFriends: gene co-expression databases and tools for humans and model organisms. Nucleic Acids Res 2022; 51:D145-D158. [PMID: 36454018 PMCID: PMC9825523 DOI: 10.1093/nar/gkac1031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 10/17/2022] [Accepted: 10/21/2022] [Indexed: 12/05/2022] Open
Abstract
Gene co-expression analysis has emerged as a powerful method to provide insights into gene function and regulation. The rapid growth of publicly available RNA-sequencing (RNA-seq) data has created opportunities for researchers to employ this abundant data to help decipher the complexity and biology of genomes. Co-expression networks have proven effective for inferring the relationship between the genes, for gene prioritization and for assigning function to poorly annotated genes based on their co-expressed partners. To facilitate such analyses we created previously an online co-expression tool for humans and mice entitled GeneFriends. To continue providing a valuable tool to the scientific community, we have now updated the GeneFriends database and website. Here, we present the new version of GeneFriends, which includes gene and transcript co-expression networks based on RNA-seq data from 46 475 human and 34 322 mouse samples. The new database also encompasses tissue-specific gene co-expression networks for 20 human and 21 mouse tissues, dataset-specific gene co-expression maps based on TCGA and GTEx projects and gene co-expression networks for additional seven model organisms (fruit fly, zebrafish, worm, rat, yeast, cow and chicken). GeneFriends is freely available at http://www.genefriends.org/.
Collapse
Affiliation(s)
- Priyanka Raina
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Rodrigo Guinea
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Kasit Chatsirisupachai
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Inês Lopes
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Zoya Farooq
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | - Cristina Guinea
- UCAL - Universidad de Ciencias y Artes de América Latina, Faculty of Design, Lima 15026, Perú
| | - Csaba-Attila Solyom
- Integrative Genomics of Ageing Group, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L7 8TX, UK
| | | |
Collapse
|
8
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
9
|
Qiu S, Yu G, Lu X, Domeniconi C, Guo M. Isoform function prediction by Gene Ontology embedding. Bioinformatics 2022; 38:4581-4588. [PMID: 35997558 DOI: 10.1093/bioinformatics/btac576] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/13/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION High-resolution annotation of gene functions is a central task in functional genomics. Multiple proteoforms translated from alternatively spliced isoforms from a single gene are actual function performers and greatly increase the functional diversity. The specific functions of different isoforms can decipher the molecular basis of various complex diseases at a finer granularity. Multi-instance learning (MIL)-based solutions have been developed to distribute gene(bag)-level Gene Ontology (GO) annotations to isoforms(instances), but they simply presume that a particular annotation of the gene is responsible by only one isoform, neglect the hierarchical structures and semantics of massive GO terms (labels), or can only handle dozens of terms. RESULTS We propose an efficacy approach IsofunGO to differentiate massive functions of isoforms by GO embedding. Particularly, IsofunGO first introduces an attributed hierarchical network to model massive GO terms, and a GO network embedding strategy to learn compact representations of GO terms and project GO annotations of genes into compressed ones, this strategy not only explores and preserves hierarchy between GO terms but also greatly reduces the prediction load. Next, it develops an attention-based MIL network to fuse genomics and transcriptomics data of isoforms and predict isoform functions by referring to compressed annotations. Extensive experiments on benchmark datasets demonstrate the efficacy of IsofunGO. Both the GO embedding and attention mechanism can boost the performance and interpretability. AVAILABILITYAND IMPLEMENTATION The code of IsofunGO is available at http://www.sdu-idea.cn/codes.php?name=IsofunGO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sichao Qiu
- School of Software, Shandong University, Jinan, Shandong 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong 250101, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan, Shandong 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong 250101, China
| | - Xudong Lu
- School of Software, Shandong University, Jinan, Shandong 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong 250101, China
| | | | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
| |
Collapse
|
10
|
Yu G, Huang Q, Zhang X, Guo M, Wang J. Tissue Specificity Based Isoform Function Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3048-3059. [PMID: 34185647 DOI: 10.1109/tcbb.2021.3093167] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Alternative splicing enables a gene spliced into different isoforms and hence protein variants. Identifying individual functions of these isoforms help deciphering the functional diversity of proteins. Although much efforts have been made for automatic gene function prediction, few efforts have been moved toward computational isoform function prediction, mainly due to the unavailable (or scanty) functional annotations of isoforms. Existing efforts directly combine multiple RNA-seq datasets without account of the important tissue specificity of alternative splicing. To bridge this gap, we introduce a novel approach called TS-Isofun to predict the functions of isoforms by integrating multiple functional association networks with respect to tissue specificity. TS-Isofun first constructs tissue-specific isoform functional association networks using multiple RNA-seq datasets from tissue-wise. Next, TS-Isofun assigns weights to these networks and models the tissue specificity by selectively integrating them with adaptive weights. It then introduces a joint matrix factorization-based data fusion model to leverage the integrated network, gene-level data and functional annotations of genes to infer the functions of isoforms. To achieve coherent weight assignment and isoform function prediction, TS-Isofun jointly optimizes the weights of individual networks and the isoform function prediction in a unified objective function. Experimental results show that TS-Isofun significantly outperforms state-of-the-art methods and the account of tissue specificity contributes to more accurate isoform function prediction.
Collapse
|
11
|
Dominant transcript expression profiles of human protein-coding genes interrogated with GTEx dataset. Sci Rep 2022; 12:6969. [PMID: 35484179 PMCID: PMC9050722 DOI: 10.1038/s41598-022-10619-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 04/11/2022] [Indexed: 12/27/2022] Open
Abstract
The discovery and quantification of mRNA transcripts using short-read next-generation sequencing (NGS) data is a complicated task. There are far more alternative mRNA transcripts expressed by human genes than can be identified from NGS transcriptome data and various bioinformatic pipelines, while the numbers of annotated human protein-coding genes has gradually declined in recent years. It is essential to learn more about the thorough tissue expression profiles of alternative transcripts in order to obtain their molecular modulations and actual functional significance. In this report, we present a bioinformatic database for interrogating the representative tissue of human protein-coding transcripts. The database allows researchers to visually explore the top-ranked transcript expression profiles in particular tissue types. Most transcripts of protein-coding genes were found to have certain tissue expression patterns. This observation demonstrated that many alternative transcripts were particularly modulated in different cell types. This user-friendly tool visually represents transcript expression profiles in a tissue-specific manner. Identification of tissue specific protein-coding genes and transcripts is a substantial advance towards interpreting their biological functions and further functional genomics studies.
Collapse
|
12
|
Li H, Eksi R, Yi D, Godfrey B, Mathew LR, O’Connor CL, Bitzer M, Kretzler M, Menon R, Guan Y. Micro-dissection and integration of long and short reads to create a robust catalog of kidney compartment-specific isoforms. PLoS Comput Biol 2022; 18:e1010040. [PMID: 35468141 PMCID: PMC9037928 DOI: 10.1371/journal.pcbi.1010040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 03/19/2022] [Indexed: 11/19/2022] Open
Abstract
Studying isoform expression at the microscopic level has always been a challenging task. A classical example is kidney, where glomerular and tubulo-interstitial compartments carry out drastically different physiological functions and thus presumably their isoform expression also differs. We aim at developing an experimental and computational pipeline for identifying isoforms at microscopic structure-level. We microdissected glomerular and tubulo-interstitial compartments from healthy human kidney tissues from two cohorts. The two compartments were separately sequenced with the PacBio RS II platform. These transcripts were then validated using transcripts of the same samples by the traditional Illumina RNA-Seq protocol, distinct Illumina RNA-Seq short reads from European Renal cDNA Bank (ERCB) samples, and annotated GENCODE transcript list, thus identifying novel transcripts. We identified 14,739 and 14,259 annotated transcripts, and 17,268 and 13,118 potentially novel transcripts in the glomerular and tubulo-interstitial compartments, respectively. Of note, relying solely on either short or long reads would have resulted in many erroneous identifications. We identified distinct pathways involved in glomerular and tubulo-interstitial compartments at the isoform level, creating an important experimental and computational resource for the kidney research community.
Collapse
Affiliation(s)
- Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Bradley Godfrey
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Lisa R. Mathew
- Harvard College, Cambridge, Massachusetts, United States of America
| | - Christopher L. O’Connor
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Markus Bitzer
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Matthias Kretzler
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (RM); (YG)
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (RM); (YG)
| |
Collapse
|
13
|
Lin CX, Li HD, Deng C, Liu W, Erhardt S, Wu FX, Zhao XM, Guan Y, Wang J, Wang D, Hu B, Wang J. An integrated brain-specific network identifies genes associated with neuropathologic and clinical traits of Alzheimer's disease. Brief Bioinform 2022; 23:bbab522. [PMID: 34953465 PMCID: PMC8769916 DOI: 10.1093/bib/bbab522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 10/26/2021] [Accepted: 11/13/2021] [Indexed: 09/24/2024] Open
Abstract
Alzheimer's disease (AD) has a strong genetic predisposition. However, its risk genes remain incompletely identified. We developed an Alzheimer's brain gene network-based approach to predict AD-associated genes by leveraging the functional pattern of known AD-associated genes. Our constructed network outperformed existing networks in predicting AD genes. We then systematically validated the predictions using independent genetic, transcriptomic, proteomic data, neuropathological and clinical data. First, top-ranked genes were enriched in AD-associated pathways. Second, using external gene expression data from the Mount Sinai Brain Bank study, we found that the top-ranked genes were significantly associated with neuropathological and clinical traits, including the Consortium to Establish a Registry for Alzheimer's Disease score, Braak stage score and clinical dementia rating. The analysis of Alzheimer's brain single-cell RNA-seq data revealed cell-type-specific association of predicted genes with early pathology of AD. Third, by interrogating proteomic data in the Religious Orders Study and Memory and Aging Project and Baltimore Longitudinal Study of Aging studies, we observed a significant association of protein expression level with cognitive function and AD clinical severity. The network, method and predictions could become a valuable resource to advance the identification of risk genes for AD.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| | - Chao Deng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| | - Weisheng Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| | - Shannon Erhardt
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States
| | - Jun Wang
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics and Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Bin Hu
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
- Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha, Hunan 410083, P. R. China
| |
Collapse
|
14
|
Yu G, Zhou G, Zhang X, Domeniconi C, Guo M. DMIL-IsoFun: predicting isoform function using deep multi-instance learning. Bioinformatics 2021; 37:4818-4825. [PMID: 34282449 DOI: 10.1093/bioinformatics/btab532] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 06/20/2021] [Accepted: 07/16/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Alternative splicing creates the considerable proteomic diversity and complexity on relatively limited genome. Proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions of this gene, which reflect the functional knowledge of genes at a finer granular level. Recently, some computational approaches have been proposed to differentiate isoform functions using sequence and expression data. However, their performance is far from being desirable, mainly due to the imbalance and lack of annotations at isoform-level, and the difficulty of modeling gene-isoform relations. RESULT We propose a deep multi-instance learning based framework (DMIL-IsoFun) to differentiate the functions of isoforms. DMIL-IsoFun firstly introduces a multi-instance learning convolution neural network trained with isoform sequences and gene-level annotations to extract the feature vectors and initialize the annotations of isoforms, and then uses a class-imbalance Graph Convolution Network to refine the annotations of individual isoforms based on the isoform co-expression network and extracted features. Extensive experimental results show that DMIL-IsoFun improves the Smin and Fmax of state-of-the-art solutions by at least 29.6% and 40.8%. The effectiveness of DMIL-IsoFun is further confirmed on a testbed of human multiple-isoform genes, and Maize isoforms related with photosynthesis. AVAILABILITY The code and data are available at http://www.sdu-idea.cn/codes.php?name=DMIL-Isofun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guoxian Yu
- School of Software, Shandong University, Jinan, 250101, China.,College of Computer and Information Sciences, Southwest University, Chongqing, 400715, China.,Computer, Electrical, and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, SA
| | - Guangjie Zhou
- School of Software, Shandong University, Jinan, 250101, China.,College of Computer and Information Sciences, Southwest University, Chongqing, 400715, China
| | - Xiangliang Zhang
- Computer, Electrical, and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, SA
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Fairfax, 22030, USA
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
15
|
Li HD, Xu Y, Zhu X, Liu Q, Omenn GS, Wang J. ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets. J Bioinform Comput Biol 2021; 18:2040009. [PMID: 32698720 DOI: 10.1142/s0219720020400090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Clustering analysis of gene expression data is essential for understanding complex biological data, and is widely used in important biological applications such as the identification of cell subpopulations and disease subtypes. In commonly used methods such as hierarchical clustering (HC) and consensus clustering (CC), holistic expression profiles of all genes are often used to assess the similarity between samples for clustering. While these methods have been proven successful in identifying sample clusters in many areas, they do not provide information about which gene sets (functions) contribute most to the clustering, thus limiting the interpretability of the resulting cluster. We hypothesize that integrating prior knowledge of annotated gene sets would not only achieve satisfactory clustering performance but also, more importantly, enable potential biological interpretation of clusters. Here we report ClusterMine, an approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets in functional annotation databases such as Gene Ontology. In addition to the cluster membership of each sample as provided by conventional approaches, it also outputs gene sets that most likely contribute to the clustering, thus facilitating biological interpretation. We compare ClusterMine with conventional approaches on nine real-world experimental datasets that represent different application scenarios in biology. We find that ClusterMine achieves better performances and that the gene sets prioritized by our method are biologically meaningful. ClusterMine is implemented as an R package and is freely available at: www.genemine.org/clustermine.php.
Collapse
Affiliation(s)
- Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Yunpei Xu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Xiaoshu Zhu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China.,School of Computer Science and Engineering, Yulin Normal University, Yulin, Guangxi, P. R. China
| | - Quan Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Gilbert S Omenn
- Departments of Computational Medicine and Bioinformatics, Internal Medicine, Human Genetics and School of Public Health, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| |
Collapse
|
16
|
Li H, Funk CC, McFarland K, Dammer EB, Allen M, Carrasquillo MM, Levites Y, Chakrabarty P, Burgess JD, Wang X, Dickson D, Seyfried NT, Duong DM, Lah JJ, Younkin SG, Levey AI, Omenn GS, Ertekin‐Taner N, Golde TE, Price ND. Integrative functional genomic analysis of intron retention in human and mouse brain with Alzheimer's disease. Alzheimers Dement 2021; 17:984-1004. [PMID: 33480174 PMCID: PMC8248162 DOI: 10.1002/alz.12254] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 10/08/2020] [Accepted: 10/17/2020] [Indexed: 12/21/2022]
Abstract
Intron retention (IR) has been implicated in the pathogenesis of complex diseases such as cancers; its association with Alzheimer's disease (AD) remains unexplored. We performed genome-wide analysis of IR through integrating genetic, transcriptomic, and proteomic data of AD subjects and mouse models from the Accelerating Medicines Partnership-Alzheimer's Disease project. We identified 4535 and 4086 IR events in 2173 human and 1736 mouse genes, respectively. Quantitation of IR enabled the identification of differentially expressed genes that conventional exon-level approaches did not reveal. There were significant correlations of intron expression within innate immune genes, like HMBOX1, with AD in humans. Peptides with a high probability of translation from intron-retained mRNAs were identified using mass spectrometry. Further, we established AD-specific intron expression Quantitative Trait Loci, and identified splicing-related genes that may regulate IR. Our analysis provides a novel resource for the search for new AD biomarkers and pathological mechanisms.
Collapse
Affiliation(s)
- Hong‐Dong Li
- Hunan Provincial Key Lab on BioinformaticsSchool of Computer Science and EngineeringCentral South UniversityChangshaHunanP.R. China
- Institute for Systems BiologySeattleWashingtonUSA
| | - Cory C. Funk
- Institute for Systems BiologySeattleWashingtonUSA
| | - Karen McFarland
- Department of Neuroscience and NeurologyCenter for Translational Research in Neurodegenerative diseaseand McKnight Brain InstituteUniversity of FloridaGainesvilleFloridaUSA
| | - Eric B. Dammer
- Department of BiochemistryEmory UniversityAtlantaGeorgiaUSA
| | - Mariet Allen
- Mayo ClinicDepartment ofNeuroscienceJacksonvilleFloridaUSA
| | | | - Yona Levites
- Department of Neuroscience and NeurologyCenter for Translational Research in Neurodegenerative diseaseand McKnight Brain InstituteUniversity of FloridaGainesvilleFloridaUSA
| | - Paramita Chakrabarty
- Department of Neuroscience and NeurologyCenter for Translational Research in Neurodegenerative diseaseand McKnight Brain InstituteUniversity of FloridaGainesvilleFloridaUSA
| | | | - Xue Wang
- Mayo ClinicDepartment of Health Sciences ResearchJacksonvilleFloridaUSA
| | - Dennis Dickson
- Mayo ClinicDepartment ofNeuroscienceJacksonvilleFloridaUSA
| | - Nicholas T. Seyfried
- Department of BiochemistryEmory UniversityAtlantaGeorgiaUSA
- Department of NeurologyEmory UniversityAtlantaGeorgiaUSA
| | - Duc M. Duong
- Department of BiochemistryEmory UniversityAtlantaGeorgiaUSA
| | - James J. Lah
- Department of NeurologyEmory UniversityAtlantaGeorgiaUSA
| | | | - Allan I. Levey
- Department of NeurologyEmory UniversityAtlantaGeorgiaUSA
| | - Gilbert S. Omenn
- Institute for Systems BiologySeattleWashingtonUSA
- Department of Computational Medicine and BioinformaticsUniversity of MichiganAnn ArborMichiganUSA
| | - Nilüfer Ertekin‐Taner
- Mayo ClinicDepartment ofNeuroscienceJacksonvilleFloridaUSA
- Mayo ClinicDepartment of NeurologyJacksonvilleFloridaUSA
| | - Todd E. Golde
- Department of Neuroscience and NeurologyCenter for Translational Research in Neurodegenerative diseaseand McKnight Brain InstituteUniversity of FloridaGainesvilleFloridaUSA
| | | |
Collapse
|
17
|
McDougall LI, Powell RM, Ratajska M, Lynch-Sutherland CF, Hossain SM, Wiggins GAR, Harazin-Lechowska A, Cybulska-Stopa B, Motwani J, Macaulay EC, Reid G, Walker LC, Ryś J, Eccles MR. Differential Expression of BARD1 Isoforms in Melanoma. Genes (Basel) 2021; 12:320. [PMID: 33672422 PMCID: PMC7927127 DOI: 10.3390/genes12020320] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 02/12/2021] [Accepted: 02/20/2021] [Indexed: 12/11/2022] Open
Abstract
Melanoma comprises <5% of cutaneous malignancies, yet it causes a significant proportion of skin cancer-related deaths worldwide. While new therapies for melanoma have been developed, not all patients respond well. Thus, further research is required to better predict patient outcomes. Using long-range nanopore sequencing, RT-qPCR, and RNA sequencing analyses, we examined the transcription of BARD1 splice isoforms in melanoma cell lines and patient tissue samples. Seventy-six BARD1 mRNA variants were identified in total, with several previously characterised isoforms (γ, φ, δ, ε, and η) contributing to a large proportion of the expressed transcripts. In addition, we identified four novel splice events, namely, Δ(E3_E9), ▼(i8), IVS10+131▼46, and IVS10▼176, occurring in various combinations in multiple transcripts. We found that short-read RNA-Seq analyses were limited in their ability to predict isoforms containing multiple non-contiguous splicing events, as compared to long-range nanopore sequencing. These studies suggest that further investigations into the functional significance of the identified BARD1 splice variants in melanoma are warranted.
Collapse
Affiliation(s)
- Lorissa I. McDougall
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
| | - Ryan M. Powell
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
| | - Magdalena Ratajska
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
- Department of Biology and Medical Genetics, Medical University of Gdansk, 80-211 Gdansk, Poland
| | - Chi F. Lynch-Sutherland
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
| | - Sultana Mehbuba Hossain
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
| | - George A. R. Wiggins
- Department of Pathology and Biomedical Science, University of Otago, Christchurch 8011, New Zealand; (G.A.R.W.); (L.C.W.)
| | - Agnieszka Harazin-Lechowska
- Department of Tumour Pathology, Maria Sklodowska-Curie National Research Institute of Oncology, Cracow Branch, 8011 Cracow, Poland; (A.H.-L.); (J.R.)
| | - Bożena Cybulska-Stopa
- Department of Clinical Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Cracow Branch, 8011 Cracow, Poland;
| | - Jyoti Motwani
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
| | - Erin C. Macaulay
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
| | - Glen Reid
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
| | - Logan C. Walker
- Department of Pathology and Biomedical Science, University of Otago, Christchurch 8011, New Zealand; (G.A.R.W.); (L.C.W.)
| | - Janusz Ryś
- Department of Tumour Pathology, Maria Sklodowska-Curie National Research Institute of Oncology, Cracow Branch, 8011 Cracow, Poland; (A.H.-L.); (J.R.)
| | - Michael R. Eccles
- Department of Pathology, Otago Medical School, Dunedin Campus, University of Otago, Dunedin 9010, New Zealand; (L.I.M.); (R.M.P.); (M.R.); (C.F.L.-S.); (S.M.H.); (J.M.); (E.C.M.); (G.R.)
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland 1010, New Zealand
| |
Collapse
|
18
|
Khan AH, Lin A, Wang RT, Bloom JS, Lange K, Smith DJ. Pooled analysis of radiation hybrids identifies loci for growth and drug action in mammalian cells. Genome Res 2020; 30:1458-1467. [PMID: 32878976 PMCID: PMC7605260 DOI: 10.1101/gr.262204.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 08/26/2020] [Indexed: 12/16/2022]
Abstract
Genetic screens in mammalian cells commonly focus on loss-of-function approaches. To evaluate the phenotypic consequences of extra gene copies, we used bulk segregant analysis (BSA) of radiation hybrid (RH) cells. We constructed six pools of RH cells, each consisting of ∼2500 independent clones, and placed the pools under selection in media with or without paclitaxel. Low pass sequencing identified 859 growth loci, 38 paclitaxel loci, 62 interaction loci, and three loci for mitochondrial abundance at genome-wide significance. Resolution was measured as ∼30 kb, close to single-gene. Divergent properties were displayed by the RH-BSA growth genes compared to those from loss-of-function screens, refuting the balance hypothesis. In addition, enhanced retention of human centromeres in the RH pools suggests a new approach to functional dissection of these chromosomal elements. Pooled analysis of RH cells showed high power and resolution and should be a useful addition to the mammalian genetic toolkit.
Collapse
Affiliation(s)
- Arshad H Khan
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA
| | - Andy Lin
- Office of Information Technology, UCLA, Los Angeles, California 90095-1557, USA
| | - Richard T Wang
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Joshua S Bloom
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
- Howard Hughes Medical Institute, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Kenneth Lange
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Desmond J Smith
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA
| |
Collapse
|
19
|
Tung KF, Pan CY, Chen CH, Lin WC. Top-ranked expressed gene transcripts of human protein-coding genes investigated with GTEx dataset. Sci Rep 2020; 10:16245. [PMID: 33004865 PMCID: PMC7530651 DOI: 10.1038/s41598-020-73081-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 09/07/2020] [Indexed: 12/13/2022] Open
Abstract
With considerable accumulation of RNA-Seq transcriptome data, we have extended our understanding about protein-coding gene transcript compositions. However, alternatively compounded patterns of human protein-coding gene transcripts would complicate gene expression data processing and interpretation. It is essential to exhaustively interrogate complex mRNA isoforms of protein-coding genes with an unified data resource. In order to investigate representative mRNA transcript isoforms to be utilized as transcriptome analysis references, we utilized GTEx data to establish a top-ranked transcript isoform expression data resource for human protein-coding genes. Distinctive tissue specific expression profiles and modulations could be observed for individual top-ranked transcripts of protein-coding genes. Protein-coding transcripts or genes do occupy much higher expression fraction in transcriptome data. In addition, top-ranked transcripts are the dominantly expressed ones in various normal tissues. Intriguingly, some of the top-ranked transcripts are noncoding splicing isoforms, which imply diverse gene regulation mechanisms. Comprehensive investigation on the tissue expression patterns of top-ranked transcript isoforms is crucial. Thus, we established a web tool to examine top-ranked transcript isoforms in various human normal tissue types, which provides concise transcript information and easy-to-use graphical user interfaces. Investigation of top-ranked transcript isoforms would contribute understanding on the functional significance of distinctive alternatively spliced transcript isoforms.
Collapse
Affiliation(s)
- Kuo-Feng Tung
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan, ROC
| | - Chao-Yu Pan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan, ROC.,Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan, ROC
| | - Chao-Hsin Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan, ROC
| | - Wen-Chang Lin
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan, ROC. .,Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan, ROC.
| |
Collapse
|
20
|
Ping L, Kundinger SR, Duong DM, Yin L, Gearing M, Lah JJ, Levey AI, Seyfried NT. Global quantitative analysis of the human brain proteome and phosphoproteome in Alzheimer's disease. Sci Data 2020; 7:315. [PMID: 32985496 PMCID: PMC7522715 DOI: 10.1038/s41597-020-00650-8] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 08/18/2020] [Indexed: 12/27/2022] Open
Abstract
Alzheimer's disease (AD) is characterized by an early, asymptomatic phase (AsymAD) in which individuals exhibit amyloid-beta (Aβ) plaque accumulation in the absence of clinically detectable cognitive decline. Here we report an unbiased multiplex quantitative proteomic and phosphoproteomic analysis using tandem mass tag (TMT) isobaric labeling of human post-mortem cortex (n = 27) across pathology-free controls, AsymAD and symptomatic AD individuals. With off-line high-pH fractionation and liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) on an Orbitrap Lumos mass spectrometer, we identified 11,378 protein groups across three TMT 11-plex batches. Immobilized metal affinity chromatography (IMAC) was used to enrich for phosphopeptides from the same TMT-labeled cases and 51,736 phosphopeptides were identified. Of these, 48,992 were quantified by TMT reporter ions representing 33,652 unique phosphosites. Two reference standards in each TMT 11-plex were included to assess intra- and inter-batch variance at the protein and peptide level. This comprehensive human brain proteome and phosphoproteome dataset will serve as a valuable resource for the identification of biochemical, cellular and signaling pathways altered during AD progression.
Collapse
Affiliation(s)
- Lingyan Ping
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
- Center for Neurodegenerative Diseases, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
| | - Sean R Kundinger
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
- Center for Neurodegenerative Diseases, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
| | - Duc M Duong
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
- Center for Neurodegenerative Diseases, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
| | - Luming Yin
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
- Center for Neurodegenerative Diseases, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
| | - Marla Gearing
- Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
- Center for Neurodegenerative Diseases, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
| | - James J Lah
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
- Center for Neurodegenerative Diseases, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
| | - Allan I Levey
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
- Center for Neurodegenerative Diseases, Emory University School of Medicine, Atlanta, GA, 30322, Georgia
| | - Nicholas T Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, 30322, Georgia.
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, 30322, Georgia.
- Center for Neurodegenerative Diseases, Emory University School of Medicine, Atlanta, GA, 30322, Georgia.
| |
Collapse
|
21
|
Yu G, Wang K, Domeniconi C, Guo M, Wang J. Isoform function prediction based on bi-random walks on a heterogeneous network. Bioinformatics 2020; 36:303-310. [PMID: 31250882 DOI: 10.1093/bioinformatics/btz535] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 06/21/2019] [Accepted: 06/26/2019] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Alternative splicing contributes to the functional diversity of protein species and the proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions. Computationally predicting the functions of genes has been studied for decades. However, how to distinguish the functional annotations of isoforms, whose annotations are essential for understanding developmental abnormalities and cancers, is rarely explored. The main bottleneck is that functional annotations of isoforms are generally unavailable and functional genomic databases universally store the functional annotations at the gene level. RESULTS We propose IsoFun to accomplish Isoform Function prediction based on bi-random walks on a heterogeneous network. IsoFun firstly constructs an isoform functional association network based on the expression profiles of isoforms derived from multiple RNA-seq datasets. Next, IsoFun uses the available Gene Ontology annotations of genes, gene-gene interactions and the relations between genes and isoforms to construct a heterogeneous network. After this, IsoFun performs a tailored bi-random walk on the heterogeneous network to predict the association between GO terms and isoforms, thus accomplishing the prediction of GO annotations of isoforms. Experimental results show that IsoFun significantly outperforms the state-of-the-art algorithms and improves the area under the receiver-operating curve (AUROC) and the area under the precision-recall curve (AUPRC) by 17% and 44% at the gene-level, respectively. We further validated the performance of IsoFun on the genes ADAM15 and BCL2L1. IsoFun accurately differentiates the functions of respective isoforms of these two genes. AVAILABILITY AND IMPLEMENTATION The code of IsoFun is available at http://mlda.swu.edu.cn/codes.php? name=IsoFun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Keyao Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| |
Collapse
|
22
|
Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020; 20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open
Abstract
In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Genentech, Inc. 1 DNA Way, Mail Stop: 35-6J, South San Francisco, CA, USA
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Kasavanahalli, Carmelaram P.O., Bengaluru, Karnataka, India
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - Stefan Canzar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| |
Collapse
|
23
|
Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020; 11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Annotating the functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the functional properties of gene products across species, and to facilitate the computational prediction of gene function. As GO is routinely updated, it serves as the gold standard and main knowledge source in functional genomics. Many gene function prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction. Next, we summarize current methods of gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.
Collapse
Affiliation(s)
- Yingwen Zhao
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jian Chen
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, China Agricultural University, Beijing, China
| | - Xiangliang Zhang
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
24
|
Isoform-Disease Association Prediction by Data Fusion. BIOINFORMATICS RESEARCH AND APPLICATIONS 2020. [DOI: 10.1007/978-3-030-57821-3_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
25
|
Abstract
Alternative Splicing produces multiple mRNA isoforms of genes which have important diverse roles such as regulation of gene expression, human heritable diseases, and response to environmental stresses. However, little has been done to assign functions at the mRNA isoform level. Functional networks, where the interactions are quantified by their probability of being involved in the same biological process are typically generated at the gene level. We use a diverse array of tissue-specific RNA-seq datasets and sequence information to train random forest models that predict the functional networks. Since there is no mRNA isoform-level gold standard, we use single isoform genes co-annotated to Gene Ontology biological process annotations, Kyoto Encyclopedia of Genes and Genomes pathways, BioCyc pathways and protein-protein interactions as functionally related (positive pair). To generate the non-functional pairs (negative pair), we use the Gene Ontology annotations tagged with "NOT" qualifier. We describe 17 Tissue-spEcific mrNa iSoform functIOnal Networks (TENSION) following a leave-one-tissue-out strategy in addition to an organism level reference functional network for mouse. We validate our predictions by comparing its performance with previous methods, randomized positive and negative class labels, updated Gene Ontology annotations, and by literature evidence. We demonstrate the ability of our networks to reveal tissue-specific functional differences of the isoforms of the same genes. All scripts and data from TENSION are available at: https://doi.org/10.25380/iastate.c.4275191 .
Collapse
Affiliation(s)
- Gaurav Kandoi
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA
- Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA
| | - Julie A Dickerson
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.
- Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA.
| |
Collapse
|
26
|
Tang Z, Chen T, Ren X, Zhang Z. Identification of transcriptional isoforms associated with survival in cancer patient. J Genet Genomics 2019; 46:413-421. [PMID: 31630971 DOI: 10.1016/j.jgg.2019.08.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 08/04/2019] [Accepted: 08/21/2019] [Indexed: 11/26/2022]
Abstract
The Cancer Genome Atlas (TCGA) project produced RNA-Seq data for tens of thousands of cancer and non-cancer samples with clinical survival information, providing an unprecedented opportunity for analyzing prognostic genes and their isoforms. In this study, we performed the first large-scale identification of transcriptional isoforms that are specifically associated with patient prognosis, even without gene-level association. These specific isoforms are defined as Transcripts Associated with Patient Prognosis (TAPPs). Although a group of TAPPs are the principal isoforms of their genes with intact functional protein domains, another group of TAPPs lack important protein domains found in their canonical gene isoforms. This dichotomy in the distribution of protein domains may indicate different patterns of TAPPs association with cancer. TAPPs in protein-coding genes, especially those with altered protein domains, are rich in known cancer driver genes. We further identified multiple types of cancer recurrent TAPPs, such as DCAF17-201, providing a new approach for the detection of cancer-associated events. In order to make the wide research community to study prognostic isoforms, we developed a portal named GESUR (http://gesur.cancer-pku.cn/), which illustrates the detailed prognostic characteristics of TAPPs and other isoforms. Overall, our integrated analysis of gene expression and clinical parameters provides a new perspective for understanding the applications of different gene isoforms in tumor progression.
Collapse
Affiliation(s)
- Zefang Tang
- School of Life Sciences and BIOPIC, Peking University, Beijing, 100871, China.
| | - Tianxiang Chen
- School of Life Sciences and BIOPIC, Peking University, Beijing, 100871, China
| | - Xianwen Ren
- School of Life Sciences and BIOPIC, Peking University, Beijing, 100871, China
| | - Zemin Zhang
- School of Life Sciences and BIOPIC, Peking University, Beijing, 100871, China; Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
27
|
Chen H, Shaw D, Zeng J, Bu D, Jiang T. DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning. Bioinformatics 2019; 35:i284-i294. [PMID: 31510699 PMCID: PMC6612874 DOI: 10.1093/bioinformatics/btz367] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. RESULTS In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision-recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. AVAILABILITY AND IMPLEMENTATION https://github.com/haochenucr/DIFFUSE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Chen
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
| | - Dipan Shaw
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tao Jiang
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
28
|
Pacini C, Koziol MJ. Bioinformatics challenges and perspectives when studying the effect of epigenetic modifications on alternative splicing. Philos Trans R Soc Lond B Biol Sci 2019; 373:rstb.2017.0073. [PMID: 29685977 PMCID: PMC5915717 DOI: 10.1098/rstb.2017.0073] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/14/2017] [Indexed: 02/07/2023] Open
Abstract
It is widely known that epigenetic modifications are important in regulating transcription, but several have also been reported in alternative splicing. The regulation of pre-mRNA splicing is important to explain proteomic diversity and the misregulation of splicing has been implicated in many diseases. Here, we give a brief overview of the role of epigenetics in alternative splicing and disease. We then discuss the bioinformatics methods that can be used to model interactions between epigenetic marks and regulators of splicing. These models can be used to identify alternative splicing and epigenetic changes across different phenotypes. This article is part of a discussion meeting issue ‘Frontiers in epigenetic chemical biology’.
Collapse
Affiliation(s)
- Clare Pacini
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Magdalena J Koziol
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK .,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| |
Collapse
|
29
|
Pandeswari PB, Sabareesh V. Middle-down approach: a choice to sequence and characterize proteins/proteomes by mass spectrometry. RSC Adv 2018; 9:313-344. [PMID: 35521579 PMCID: PMC9059502 DOI: 10.1039/c8ra07200k] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/11/2018] [Indexed: 12/27/2022] Open
Abstract
Owing to rapid growth in the elucidation of genome sequences of various organisms, deducing proteome sequences has become imperative, in order to have an improved understanding of biological processes. Since the traditional Edman method was unsuitable for high-throughput sequencing and also for N-terminus modified proteins, mass spectrometry (MS) based methods, mainly based on soft ionization modes: electrospray ionization and matrix-assisted laser desorption/ionization, began to gain significance. MS based methods were adaptable for high-throughput studies and applicable for sequencing N-terminus blocked proteins/peptides too. Consequently, over the last decade a new discipline called 'proteomics' has emerged, which encompasses the attributes necessary for high-throughput identification of proteins. 'Proteomics' may also be regarded as an offshoot of the classic field, 'biochemistry'. Many protein sequencing and proteomic investigations were successfully accomplished through MS dependent sequence elucidation of 'short proteolytic peptides (typically: 7-20 amino acid residues), which is called the 'shotgun' or 'bottom-up (BU)' approach. While the BU approach continues as a workhorse for proteomics/protein sequencing, attempts to sequence intact proteins without proteolysis, called the 'top-down (TD)' approach started, due to ambiguities in the BU approach, e.g., protein inference problem, identification of proteoforms and the discovery of posttranslational modifications (PTMs). The high-throughput TD approach (TD proteomics) is yet in its infancy. Nevertheless, TD characterization of purified intact proteins has been useful for detecting PTMs. With the hope to overcome the pitfalls of BU and TD strategies, another concept called the 'middle-down (MD)' approach was put forward. Similar to BU, the MD approach also involves proteolysis, but in a restricted manner, to produce 'longer' proteolytic peptides than the ones usually obtained in BU studies, thereby providing better sequence coverage. In this regard, special proteases (OmpT, Sap9, IdeS) have been used, which can cleave proteins to produce longer proteolytic peptides. By reviewing ample evidences currently existing in the literature that is predominantly on PTM characterization of histones and antibodies, herein we highlight salient features of the MD approach. Consequently, we are inclined to claim that the MD concept might have widespread applications in future for various research areas, such as clinical, biopharmaceuticals (including PTM analysis) and even for general/routine characterization of proteins including therapeutic proteins, but not just limited to analysis of histones or antibodies.
Collapse
Affiliation(s)
- P Boomathi Pandeswari
- Advanced Centre for Bio Separation Technology (CBST), Vellore Institute of Technology (VIT) Vellore Tamil Nadu 632014 India
| | - Varatharajan Sabareesh
- Advanced Centre for Bio Separation Technology (CBST), Vellore Institute of Technology (VIT) Vellore Tamil Nadu 632014 India
| |
Collapse
|
30
|
Ashraf U, Benoit-Pilven C, Lacroix V, Navratil V, Naffakh N. Advances in Analyzing Virus-Induced Alterations of Host Cell Splicing. Trends Microbiol 2018; 27:268-281. [PMID: 30577974 DOI: 10.1016/j.tim.2018.11.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 10/19/2018] [Accepted: 11/09/2018] [Indexed: 12/14/2022]
Abstract
Alteration of host cell splicing is a common feature of many viral infections which is underappreciated because of the complexity and technical difficulty of studying alternative splicing (AS) regulation. Recent advances in RNA sequencing technologies revealed that up to several hundreds of host genes can show altered mRNA splicing upon viral infection. The observed changes in AS events can be either a direct consequence of viral manipulation of the host splicing machinery or result indirectly from the virus-induced innate immune response or cellular damage. Analysis at a higher resolution with single-cell RNAseq, and at a higher scale with the integration of multiple omics data sets in a systems biology perspective, will be needed to further comprehend this complex facet of virus-host interactions.
Collapse
Affiliation(s)
- Usama Ashraf
- Institut Pasteur, Unité de Génétique Moléculaire des Virus à ARN, Département de Virologie, F-75015 Paris, France; CNRS UMR3569, F-75015 Paris, France; Université Paris Diderot, Sorbonne Paris Cité EA302, F-75015 Paris, France
| | - Clara Benoit-Pilven
- INSERM U1028; CNRS UMR5292, Lyon Neuroscience Research Center, Genetic of Neuro-development Anomalies Team, F-69000 Lyon, France; Université Claude Bernard Lyon 1, CNRS UMR5558, Laboratoire de Biométrie et Biologie Evolutive, F-69622 Villeurbanne, France; EPI ERABLE, INRIA Grenoble Rhône-Alpes, F-38330 Montbonnot Saint-Martin, France
| | - Vincent Lacroix
- Université Claude Bernard Lyon 1, CNRS UMR5558, Laboratoire de Biométrie et Biologie Evolutive, F-69622 Villeurbanne, France; EPI ERABLE, INRIA Grenoble Rhône-Alpes, F-38330 Montbonnot Saint-Martin, France
| | - Vincent Navratil
- PRABI, Rhône Alpes Bioinformatics Center, UCBL, Université Claude Bernard Lyon 1, F-69000 Lyon, France; European Virus Bioinformatics Center, Leutragraben 1, D-07743 Jena, Germany
| | - Nadia Naffakh
- Institut Pasteur, Unité de Génétique Moléculaire des Virus à ARN, Département de Virologie, F-75015 Paris, France; CNRS UMR3569, F-75015 Paris, France; Université Paris Diderot, Sorbonne Paris Cité EA302, F-75015 Paris, France.
| |
Collapse
|
31
|
Zhang X, Yu S, Yang Q, Wang K, Zhang S, Pan C, Yan H, Dang R, Lei C, Chen H, Lan X. Goat Boule: Isoforms identification, mRNA expression in testis and functional study and promoter methylation profiles. Theriogenology 2018; 116:53-63. [PMID: 29778921 DOI: 10.1016/j.theriogenology.2018.05.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 05/03/2018] [Accepted: 05/04/2018] [Indexed: 10/16/2022]
Abstract
A conserved gene in meiosis, the Boule gene is involved in meiosis and spermatogenesis. The deletion of this gene in males blocks meiosis and results in infertility. Alternative splicing variants of the Boule gene have been identified in humans, bovines, and bats, but in dairy goats remains unknown. This study was therefore to detect splicing variants of the goat Boule gene and explore their potential roles in meiosis. Three isoforms, denoted as Boule-a, Boule-b, and Boule-c, were identified in the testes of goats using real-time PCR (RT-PCR) and cloning sequencing. Compared to the normal Boule gene, Boule-a was found to lack exons 7 and 8, which corresponds to a predicted variant, X4, on the NCBI database. Boule-b lacked exon 8, and Boule-c only retained exons 1 and 2. Of these three variants, two were novel isoforms of the Boule gene. Quantitative RT-PCR (qRT-PCR) showed that the Boule-a and Boule-b expression patterns were significantly different between the adult goat testes and the postnatal testes of 42 and 56 days. Overexpression of Boule-a and Boule-c in GC-1 spg cells of model mice significantly repressed CDC2 expression. Bisulfite sequencing PCR (BSP) results showed that the promoter region of the Boule gene was hypermethylated in goat testes. A negative correlation between the methylation levels of the Boule gene promoter and total mRNA expression of its transcripts was found. Our data showed alternative splicing and promoter methylation in the goat Boule gene, suggesting that this gene may play an important role in the regulation of Boule expression and in meiosis processing.
Collapse
Affiliation(s)
- Xiaoyan Zhang
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China.
| | - Shuai Yu
- College of Veterinary Medicine, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China
| | - Qing Yang
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China
| | - Ke Wang
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China
| | - Sihuan Zhang
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China
| | - Chuanying Pan
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China
| | - Hailong Yan
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China; Shaanxi Provincial Engineering and Technology Research Center of Cashmere Goats, Yulin University, Yulin 719000, China
| | - Ruihua Dang
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China
| | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China
| | - Hong Chen
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China
| | - Xianyong Lan
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shaanxi 712100, PR China.
| |
Collapse
|
32
|
Yu KH, Lee TLM, Wang CS, Chen YJ, Ré C, Kou SC, Chiang JH, Kohane IS, Snyder M. Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining. J Proteome Res 2018; 17:1383-1396. [PMID: 29505266 DOI: 10.1021/acs.jproteome.7b00772] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
There are more than 3.7 million published articles on the biological functions or disease implications of proteins, constituting an important resource of proteomics knowledge. However, it is difficult to summarize the millions of proteomics findings in the literature manually and quantify their relevance to the biology and diseases of interest. We developed a fully automated bioinformatics framework to identify and prioritize proteins associated with any biological entity. We used the 22 targeted areas of the Biology/Disease-driven (B/D)-Human Proteome Project (HPP) as examples, prioritized the relevant proteins through their Protein Universal Reference Publication-Originated Search Engine (PURPOSE) scores, validated the relevance of the score by comparing the protein prioritization results with a curated database, computed the scores of proteins across the topics of B/D-HPP, and characterized the top proteins in the common model organisms. We further extended the bioinformatics workflow to identify the relevant proteins in all organ systems and human diseases and deployed a cloud-based tool to prioritize proteins related to any custom search terms in real time. Our tool can facilitate the prioritization of proteins for any organ system or disease of interest and can contribute to the development of targeted proteomic studies for precision medicine.
Collapse
Affiliation(s)
- Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, United States
- Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Tsung-Lu Michael Lee
- Department of Information Engineering, Kun Shan University, Tainan City 710-03, Taiwan
| | - Chi-Shiang Wang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701-01, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei 115-29, Taiwan
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, California 94305, United States
| | - Samuel C. Kou
- Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701-01, Taiwan
| | - Isaac S. Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
33
|
Jyotsana N, Heuser M. Exploiting differential RNA splicing patterns: a potential new group of therapeutic targets in cancer. Expert Opin Ther Targets 2017; 22:107-121. [PMID: 29235382 DOI: 10.1080/14728222.2018.1417390] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
INTRODUCTION Mutations in genes associated with splicing have been found in hematologic malignancies, but also in solid cancers. Aberrant cancer specific RNA splicing either results from mutations or misexpression of the spliceosome genes directly, or from mutations in splice sites of oncogenes or tumor suppressors. Areas covered: In this review, we present molecular targets of aberrant splicing in various malignancies, information on existing and emerging therapeutics against such targets, and strategies for future drug development. Expert opinion: Alternative splicing is an important mechanism that controls gene expression, and hence pharmacologic and genetic control of aberrant alternative RNA splicing has been proposed as a potential therapy in cancer. To identify and validate aberrant RNA splicing patterns as therapeutic targets we need to (1) characterize the most common genetic aberrations of the spliceosome and of splice sites, (2) understand the dysregulated downstream pathways and (3) exploit in-vivo disease models of aberrant splicing. Antisense oligonucleotides show promising activity, but will benefit from improved delivery tools. Inhibitors of mutated splicing factors require improved specificity, as alternative and aberrant splicing are often intertwined like two sides of the same coin. In summary, targeting aberrant splicing is an early but emerging field in cancer treatment.
Collapse
Affiliation(s)
- Nidhi Jyotsana
- a Department of Hematology, Hemostasis, Oncology and Stem Cell Transplantation , Hannover Medical School , Hannover , Germany
| | - Michael Heuser
- a Department of Hematology, Hemostasis, Oncology and Stem Cell Transplantation , Hannover Medical School , Hannover , Germany
| |
Collapse
|
34
|
Framework and resource for more than 11,000 gene-transcript-protein-reaction associations in human metabolism. Proc Natl Acad Sci U S A 2017; 114:E9740-E9749. [PMID: 29078384 DOI: 10.1073/pnas.1713050114] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Alternative splicing plays important roles in generating different transcripts from one gene, and consequently various protein isoforms. However, there has been no systematic approach that facilitates characterizing functional roles of protein isoforms in the context of the entire human metabolism. Here, we present a systematic framework for the generation of gene-transcript-protein-reaction associations (GeTPRA) in the human metabolism. The framework in this study generated 11,415 GeTPRA corresponding to 1,106 metabolic genes for both principal and nonprincipal transcripts (PTs and NPTs) of metabolic genes. The framework further evaluates GeTPRA, using a human genome-scale metabolic model (GEM) that is biochemically consistent and transcript-level data compatible, and subsequently updates the human GEM. A generic human GEM, Recon 2M.1, was developed for this purpose, and subsequently updated to Recon 2M.2 through the framework. Both PTs and NPTs of metabolic genes were considered in the framework based on prior analyses of 446 personal RNA-Seq data and 1,784 personal GEMs reconstructed using Recon 2M.1. The framework and the GeTPRA will contribute to better understanding human metabolism at the systems level and enable further medical applications.
Collapse
|
35
|
Yang S, Shao F, Duan W, Zhao Y, Chen F. Variance component testing for identifying differentially expressed genes in RNA-seq data. PeerJ 2017; 5:e3797. [PMID: 28929020 PMCID: PMC5592911 DOI: 10.7717/peerj.3797] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 08/21/2017] [Indexed: 01/28/2023] Open
Abstract
RNA sequencing (RNA-Seq) enables the measurement and comparison of gene expression with isoform-level quantification. Differences in the effect of each isoform may make traditional methods, which aggregate isoforms, ineffective. Here, we introduce a variance component-based test that can jointly test multiple isoforms of one gene to identify differentially expressed (DE) genes, especially those with isoforms that have differential effects. We model isoform-level expression data from RNA-Seq using a negative binomial distribution and consider the baseline abundance of isoforms and their effects as two random terms. Our approach tests the global null hypothesis of no difference in any of the isoforms. The null distribution of the derived score statistic is investigated using empirical and theoretical methods. The results of simulations suggest that the performance of the proposed set test is superior to that of traditional algorithms and almost reaches optimal power when the variance of covariates is large. This method is also applied to analyze real data. Our algorithm, as a supplement to traditional algorithms, is superior at selecting DE genes with sparse or opposite effects for isoforms.
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| | - Fang Shao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| | - Weiwei Duan
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| | - Feng Chen
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| |
Collapse
|
36
|
Yu G, Fu G, Lu C, Ren Y, Wang J. BRWLDA: bi-random walks for predicting lncRNA-disease associations. Oncotarget 2017; 8:60429-60446. [PMID: 28947982 PMCID: PMC5601150 DOI: 10.18632/oncotarget.19588] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 06/19/2017] [Indexed: 12/20/2022] Open
Abstract
Increasing efforts have been done to figure out the association between lncRNAs and complex diseases. Many computational models construct various lncRNA similarity networks, disease similarity networks, along with known lncRNA-disease associations to infer novel associations. However, most of them neglect the structural difference between lncRNAs network and diseases network, hierarchical relationships between diseases and pattern of newly discovered associations. In this study, we developed a model that performs Bi-Random Walks to predict novel LncRNA-Disease Associations (BRWLDA in short). This model utilizes multiple heterogeneous data to construct the lncRNA functional similarity network, and Disease Ontology to construct a disease network. It then constructs a directed bi-relational network based on these two networks and available lncRNAs-disease associations. Next, it applies bi-random walks on the network to predict potential associations. BRWLDA achieves reliable and better performance than other comparing methods not only on experiment verified associations, but also on the simulated experiments with masked associations. Case studies further demonstrate the feasibility of BRWLDA in identifying new lncRNA-disease associations.
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Guangyuan Fu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Chang Lu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Yazhou Ren
- Big Data Research Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| |
Collapse
|
37
|
Pantazatos SP, Huang YY, Rosoklija GB, Dwork AJ, Arango V, Mann JJ. Whole-transcriptome brain expression and exon-usage profiling in major depression and suicide: evidence for altered glial, endothelial and ATPase activity. Mol Psychiatry 2017; 22:760-773. [PMID: 27528462 PMCID: PMC5313378 DOI: 10.1038/mp.2016.130] [Citation(s) in RCA: 156] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Revised: 06/04/2016] [Accepted: 06/07/2016] [Indexed: 12/30/2022]
Abstract
Brain gene expression profiling studies of suicide and depression using oligonucleotide microarrays have often failed to distinguish these two phenotypes. Moreover, next generation sequencing approaches are more accurate in quantifying gene expression and can detect alternative splicing. Using RNA-seq, we examined whole-exome gene and exon expression in non-psychiatric controls (CON, N=29), DSM-IV major depressive disorder suicides (MDD-S, N=21) and MDD non-suicides (MDD, N=9) in the dorsal lateral prefrontal cortex (Brodmann Area 9) of sudden death medication-free individuals post mortem. Using small RNA-seq, we also examined miRNA expression (nine samples per group). DeSeq2 identified 35 genes differentially expressed between groups and surviving adjustment for false discovery rate (adjusted P<0.1). In depression, altered genes include humanin-like-8 (MTRNRL8), interleukin-8 (IL8), and serpin peptidase inhibitor, clade H (SERPINH1) and chemokine ligand 4 (CCL4), while exploratory gene ontology (GO) analyses revealed lower expression of immune-related pathways such as chemokine receptor activity, chemotaxis and cytokine biosynthesis, and angiogenesis and vascular development in (adjusted P<0.1). Hypothesis-driven GO analysis suggests lower expression of genes involved in oligodendrocyte differentiation, regulation of glutamatergic neurotransmission, and oxytocin receptor expression in both suicide and depression, and provisional evidence for altered DNA-dependent ATPase expression in suicide only. DEXSEq analysis identified differential exon usage in ATPase, class II, type 9B (adjusted P<0.1) in depression. Differences in miRNA expression or structural gene variants were not detected. Results lend further support for models in which deficits in microglial, endothelial (blood-brain barrier), ATPase activity and astrocytic cell functions contribute to MDD and suicide, and identify putative pathways and mechanisms for further study in these disorders.
Collapse
Affiliation(s)
- Spiro P. Pantazatos
- Molecular Imaging and Neuropathology Division, New York State Psychiatric Institute, New York, NY,Department of Psychiatry, New York, NY
| | - Yung-yu Huang
- Molecular Imaging and Neuropathology Division, New York State Psychiatric Institute, New York, NY,Department of Psychiatry, New York, NY
| | - Gorazd B. Rosoklija
- Molecular Imaging and Neuropathology Division, New York State Psychiatric Institute, New York, NY,Department of Psychiatry, New York, NY
| | | | - Victoria Arango
- Molecular Imaging and Neuropathology Division, New York State Psychiatric Institute, New York, NY,Department of Psychiatry, New York, NY
| | - J. John Mann
- Molecular Imaging and Neuropathology Division, New York State Psychiatric Institute, New York, NY,Department of Psychiatry, New York, NY,To whom correspondence should be addressed:
| |
Collapse
|
38
|
Tranchevent LC, Aubé F, Dulaurier L, Benoit-Pilven C, Rey A, Poret A, Chautard E, Mortada H, Desmet FO, Chakrama FZ, Moreno-Garcia MA, Goillot E, Janczarski S, Mortreux F, Bourgeois CF, Auboeuf D. Identification of protein features encoded by alternative exons using Exon Ontology. Genome Res 2017; 27:1087-1097. [PMID: 28420690 PMCID: PMC5453322 DOI: 10.1101/gr.212696.116] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 03/28/2017] [Indexed: 12/16/2022]
Abstract
Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward this end, we developed a computational approach, named “Exon Ontology,” based on terms corresponding to well-characterized protein features organized in an ontology tree. Exon Ontology is conceptually similar to Gene Ontology-based approaches but focuses on exon-encoded protein features instead of gene level functional annotations. Exon Ontology describes the protein features encoded by a selected list of exons and looks for potential Exon Ontology term enrichment. By applying this strategy to exons that are differentially spliced between epithelial and mesenchymal cells and after extensive experimental validation, we demonstrate that Exon Ontology provides support to discover specific protein features regulated by alternative splicing. We also show that Exon Ontology helps to unravel biological processes that depend on suites of coregulated alternative exons, as we uncovered a role of epithelial cell-enriched splicing factors in the AKT signaling pathway and of mesenchymal cell-enriched splicing factors in driving splicing events impacting on autophagy. Freely available on the web, Exon Ontology is the first computational resource that allows getting a quick insight into the protein features encoded by alternative exons and investigating whether coregulated exons contain the same biological information.
Collapse
Affiliation(s)
- Léon-Charles Tranchevent
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fabien Aubé
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Louis Dulaurier
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Clara Benoit-Pilven
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Amandine Rey
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Arnaud Poret
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Emilie Chautard
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, UMR CNRS 5558, INRIA Erable, Villeurbanne, F-69622, France
| | - Hussein Mortada
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - François-Olivier Desmet
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fatima Zahra Chakrama
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Maira Alejandra Moreno-Garcia
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Evelyne Goillot
- Institut NeuroMyoGène, CNRS UMR 5310, INSERM U1217, Université Lyon 1, Lyon, F-69007 France
| | - Stéphane Janczarski
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Franck Mortreux
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Cyril F Bourgeois
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Didier Auboeuf
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| |
Collapse
|
39
|
Transcriptome profile of the human placenta. Funct Integr Genomics 2017; 17:551-563. [PMID: 28251419 PMCID: PMC5561170 DOI: 10.1007/s10142-017-0555-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 02/09/2017] [Accepted: 02/16/2017] [Indexed: 01/09/2023]
Abstract
The human placenta is a particular organ that inseparably binds the mother and the fetus. The proper development and survival of the conceptus relies on the essential interplay between maternal and fetal factors involved in cooperation within the placenta. In our study, high-throughput sequencing (RNA-seq) was applied to analyze the global transcriptome of the human placenta during uncomplicated pregnancies. The RNA-seq was utilized to identify the global pattern of the gene expression in placentas (N = 4) from women in single and twin pregnancies. During analyses, we obtained 228,044 transcripts. More than 91% of them were multi-exon, and among them 134 were potentially unknown protein coding genes. Expression levels (FPKM) were estimated for 38,948 transcriptional active regions, and more than 3000 of genes were expressed with FPKM >20 in each sample. Additionally, all unannotated transcripts with estimated FPKM values were localized on the human genome. Highly covered splice junctions unannotated in the human genome (6497) were identified, and among them 30 were novel. To gain a better understanding of the biological implications, the assembled transcripts were annotated with gene ontology (GO) terms. Single nucleotide variants were predicted for the transcripts assigned to each analyzed GO category. Our results may be useful for establishing a general pattern of the gene expression in the human placenta. Characterizing placental transcriptome, which is crucial for a pregnancy’s outcome, can serve as a basis for identifying the mechanisms underlying physiological pregnancy, as well as may be useful for an early detection of the genomic defects.
Collapse
|
40
|
Guerrero CR, Jagtap PD, Johnson JE, Griffin TJ. Using Galaxy for Proteomics. PROTEOME INFORMATICS 2016. [DOI: 10.1039/9781782626732-00289] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The area of informatics for mass spectrometry (MS)-based proteomics data has steadily grown over the last two decades. Numerous, effective software programs now exist for various aspects of proteomic informatics. However, many researchers still have difficulties in using these software. These difficulties arise from problems with running and integrating disparate software programs, scalability issues when dealing with large data volumes, and lack of ability to share and reproduce workflows comprised of different software. The Galaxy framework for bioinformatics provides an attractive option for solving many of these current issues in proteomic informatics. Originally developed as a workbench to enable genomic data analysis, numerous researchers are now turning to Galaxy to implement software for MS-based proteomics applications. Here, we provide an introduction to Galaxy and its features, and describe how software tools are deployed, published and shared via the scalable framework. We also describe some of the existing tools in Galaxy for basic MS-based proteomics data analysis and informatics. Finally, we describe how proteomics tools in Galaxy can be combined with other existing tools for genomic and transcriptomic data analysis to enable powerful multi-omic data analysis applications.
Collapse
Affiliation(s)
- Candace R. Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota 512 Walter Library, 117 Pleasant Street SE Minneapolis MN 55455 USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota 321 Church St SE/6-155 Jackson Hall Minneapolis MN 55455 USA
- Center for Mass Spectrometry and Proteomics, University of Minnesota 1479 Gortner Avenue, St. Paul MN 55108 USA
| |
Collapse
|
41
|
Yu W, Li Y, Wang Z, Liu L, Liu J, Ding F, Zhang X, Cheng Z, Chen P, Dou J. Transcriptomic changes in human renal proximal tubular cells revealed under hypoxic conditions by RNA sequencing. Int J Mol Med 2016; 38:894-902. [PMID: 27432315 DOI: 10.3892/ijmm.2016.2677] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 07/07/2016] [Indexed: 11/05/2022] Open
Abstract
Chronic hypoxia often occurs among patients with chronic kidney disease (CKD). Renal proximal tubular cells may be the primary target of a hypoxic insult. However, the underlying transcriptional mechanisms remain undefined. In this study, we revealed the global changes in gene expression in HK‑2 human renal proximal tubular cells under hypoxic and normoxic conditions. We analyzed the transcriptome of HK‑2 cells exposed to hypoxia for 24 h using RNA sequencing. A total of 279 differentially expressed genes was examined, as these genes could potentially explain the differences in HK‑2 cells between hypoxic and normoxic conditions. Moreover, 17 genes were validated by qPCR, and the results were highly concordant with the RNA seqencing results. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to better understand the functions of these differentially expressed genes. The upregulated genes appeared to be significantly enriched in the pathyway of extracellular matrix (ECM)-receptor interaction, and in paticular, the pathway of renal cell carcinoma was upregulated under hypoxic conditions. The downregulated genes were enriched in the signaling pathway related to antigen processing and presentation; however, the pathway of glutathione metabolism was downregulated. Our analysis revealed numerous novel transcripts and alternative splicing events. Simultaneously, we also identified a large number of single nucleotide polymorphisms, which will be a rich resource for future marker development. On the whole, our data indicate that transcriptome analysis provides valuable information for a more in depth understanding of the molecular mechanisms in CKD and renal cell carcinoma.
Collapse
Affiliation(s)
- Wenmin Yu
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Yiping Li
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Zhi Wang
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Lei Liu
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Jing Liu
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Fengan Ding
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Xiaoyi Zhang
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Zhengyuan Cheng
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Pingsheng Chen
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| | - Jun Dou
- Medical School of Southeast University, Nanjing, Jiangsu 210009, P.R. China
| |
Collapse
|
42
|
Panwar B, Menon R, Eksi R, Li HD, Omenn GS, Guan Y. Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning. J Proteome Res 2016; 15:1747-53. [PMID: 27142340 DOI: 10.1021/acs.jproteome.5b00883] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The vast majority of human multiexon genes undergo alternative splicing and produce a variety of splice variant transcripts and proteins, which can perform different functions. These protein-coding splice variants (PCSVs) greatly increase the functional diversity of proteins. Most functional annotation algorithms have been developed at the gene level; the lack of isoform-level gold standards is an important intellectual limitation for currently available machine learning algorithms. The accumulation of a large amount of RNA-seq data in the public domain greatly increases our ability to examine the functional annotation of genes at isoform level. In the present study, we used a multiple instance learning (MIL)-based approach for predicting the function of PCSVs. We used transcript-level expression values and gene-level functional associations from the Gene Ontology database. A support vector machine (SVM)-based 5-fold cross-validation technique was applied. Comparatively, genes with multiple PCSVs performed better than single PCSV genes, and performance also improved when more examples were available to train the models. We demonstrated our predictions using literature evidence of ADAM15, LMNA/C, and DMXL2 genes. All predictions have been implemented in a web resource called "IsoFunc", which is freely available for the global scientific community through http://guanlab.ccmb.med.umich.edu/isofunc .
Collapse
Affiliation(s)
- Bharat Panwar
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
43
|
Guan Y, Martini S, Mariani LH. Genes Caught In Flagranti: Integrating Renal Transcriptional Profiles With Genotypes and Phenotypes. Semin Nephrol 2016. [PMID: 26215861 DOI: 10.1016/j.semnephrol.2015.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
In the past decade, population genetics has gained tremendous success in identifying genetic variations that are statistically relevant to renal diseases and kidney function. However, it is challenging to interpret the functional relevance of the genetic variations found by population genetics studies. In this review, we discuss studies that integrate multiple levels of data, especially transcriptome profiles and phenotype data, to assign functional roles of genetic variations involved in kidney function. Furthermore, we introduce state-of-the-art machine learning algorithms, Bayesian networks, support vector machines, and Gaussian process regression, which have been applied successfully to integrating genetic, regulatory, and clinical information to predict clinical outcomes. These methods are likely to be deployed successfully in the nephrology field in the near future.
Collapse
Affiliation(s)
- Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI; Department of Internal Medicine, University of Michigan, Ann Arbor, MI; Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI
| | - Sebastian Martini
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI; Nephrologisches Zentrum, Medizinische Klinik und Poliklinik IV, Klinikum der Universität München, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Laura H Mariani
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI
| |
Collapse
|
44
|
Abstract
The laboratory mouse is the primary mammalian species used for studying alternative splicing events. Recent studies have generated computational models to predict functions for splice isoforms in the mouse. However, the functional relationship network, describing the probability of splice isoforms participating in the same biological process or pathway, has not yet been studied in the mouse. Here we describe a rich genome-wide resource of mouse networks at the isoform level, which was generated using a unique framework that was originally developed to infer isoform functions. This network was built through integrating heterogeneous genomic and protein data, including RNA-seq, exon array, protein docking and pseudo-amino acid composition. Through simulation and cross-validation studies, we demonstrated the accuracy of the algorithm in predicting isoform-level functional relationships. We showed that this network enables the users to reveal functional differences of the isoforms of the same gene, as illustrated by literature evidence with Anxa6 (annexin a6) as an example. We expect this work will become a useful resource for the mouse genetics community to understand gene functions. The network is publicly available at: http://guanlab.ccmb.med.umich.edu/isoformnetwork.
Collapse
|
45
|
Hsu MK, Pan CL, Chen FC. Functional divergence and convergence between the transcript network and gene network in lung adenocarcinoma. Onco Targets Ther 2016; 9:335-47. [PMID: 26834492 PMCID: PMC4716766 DOI: 10.2147/ott.s94897] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
INTRODUCTION Alternative RNA splicing is a critical regulatory mechanism during tumorigenesis. However, previous oncological studies mainly focused on the splicing of individual genes. Whether and how transcript isoforms are coordinated to affect cellular functions remain underexplored. Also of great interest is how the splicing regulome cooperates with the transcription regulome to facilitate tumorigenesis. The answers to these questions are of fundamental importance to cancer biology. RESULTS Here, we report a comparative study between the transcript-based network (TN) and the gene-based network (GN) derived from the transcriptomes of paired tumor-normal tissues from 77 lung adenocarcinoma patients. We demonstrate that the two networks differ significantly from each other in terms of patient clustering and the number and functions of network modules. Interestingly, the majority (89.5%) of multi-transcript genes have their transcript isoforms distributed in at least two TN modules, suggesting regulatory and functional divergences between transcript isoforms. Furthermore, TN and GN modules share onlŷ50%-60% of their biological functions. TN thus appears to constitute a regulatory layer separate from GN. Nevertheless, our results indicate that functional convergence and divergence both occur between TN and GN, implying complex interactions between the two regulatory layers. Finally, we report that the expression profiles of module members in both TN and GN shift dramatically yet concordantly during tumorigenesis. The mechanisms underlying this coordinated shifting remain unclear yet are worth further explorations. CONCLUSION We show that in lung adenocarcinoma, transcript isoforms per se are coordinately regulated to conduct biological functions not conveyed by the network of genes. However, the two networks may interact closely with each other by sharing the same or related biological functions. Unraveling the effects and mechanisms of such interactions will significantly advance our understanding of this deadly disease.
Collapse
Affiliation(s)
- Min-Kung Hsu
- Department of Biological Science and Technology, National Chiao-Tung University, Hsinchu, Taiwan
| | - Chia-Lin Pan
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| | - Feng-Chi Chen
- Department of Biological Science and Technology, National Chiao-Tung University, Hsinchu, Taiwan; Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan; School of Dentistry, China Medical University, Taichung, Taiwan
| |
Collapse
|
46
|
Li HD, Omenn GS, Guan Y. A proteogenomic approach to understand splice isoform functions through sequence and expression-based computational modeling. Brief Bioinform 2016; 17:1024-1031. [PMID: 26740460 DOI: 10.1093/bib/bbv109] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 11/03/2015] [Indexed: 01/23/2023] Open
Abstract
The products of multi-exon genes are a mixture of alternatively spliced isoforms, from which the translated proteins can have similar, different or even opposing functions. It is therefore essential to differentiate and annotate functions for individual isoforms. Computational approaches provide an efficient complement to expensive and time-consuming experimental studies. The input data of these methods range from DNA sequence, to RNA selection pressure, to expressed sequence tags, to full-length complementary DNA, to exon array, to RNA-seq expression, to proteomic data. Notably, RNA-seq technology generates quantitative profiling of transcript expression at the genome scale, with an unprecedented amount of expression data available for developing isoform function prediction methods. Integrative analysis of these data at different molecular levels enables a proteogenomic approach to systematically interrogate isoform functions. Here, we briefly review the state-of-the-art methods according to their input data sources, discuss their advantages and limitations and point out potential ways to improve prediction accuracies.
Collapse
|
47
|
Schweizer RM, Robinson J, Harrigan R, Silva P, Galverni M, Musiani M, Green RE, Novembre J, Wayne RK. Targeted capture and resequencing of 1040 genes reveal environmentally driven functional variation in grey wolves. Mol Ecol 2015; 25:357-79. [DOI: 10.1111/mec.13467] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 11/04/2015] [Accepted: 11/06/2015] [Indexed: 12/29/2022]
Affiliation(s)
- Rena M. Schweizer
- Department of Ecology and Evolutionary Biology University of California, Los Angeles 610 Charles E Young Dr East Los Angeles CA 90095 USA
| | - Jacqueline Robinson
- Department of Ecology and Evolutionary Biology University of California, Los Angeles 610 Charles E Young Dr East Los Angeles CA 90095 USA
| | - Ryan Harrigan
- Center for Tropical Research Institute of the Environment and Sustainability University of California 619 Charles E. Young Drive East Los Angeles CA 90095 USA
| | - Pedro Silva
- CIBIO/InBio – Centro de Investigação em Biodiversidade e Recursos Genéticos Universidade do Porto Campus Agrário de Vairão 4485‐661 Vairão Portugal
- Departamento de Biologia Faculdade de Ciências Universidade do Porto Rua do Campo Alegre s/n. 4169‐007 Porto Portugal
| | - Marco Galverni
- Laboratory of Genetics ISPRA (Istituto Superiore per la Protezione e Ricerca Ambientale) Via Cà Fornacetta 9 40064 Ozzano dell'Emilia BO Italy
| | - Marco Musiani
- Faculties of Environmental Design and Veterinary Medicine (Joint Appointment) EVDS University of Calgary 2500 University Dr NW Calgary Alberta Canada T2N 1N4
| | - Richard E. Green
- Department of Biomolecular Engineering University of California Santa Cruz CA 95060 USA
| | - John Novembre
- Department of Human Genetics University of Chicago 920 E. 58th Street Chicago IL 60637 USA
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology University of California, Los Angeles 610 Charles E Young Dr East Los Angeles CA 90095 USA
| |
Collapse
|
48
|
Zhao S, Xi L, Zhang B. Union Exon Based Approach for RNA-Seq Gene Quantification: To Be or Not to Be? PLoS One 2015; 10:e0141910. [PMID: 26559532 PMCID: PMC4641603 DOI: 10.1371/journal.pone.0141910] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 10/14/2015] [Indexed: 11/24/2022] Open
Abstract
In recent years, RNA-seq is emerging as a powerful technology in estimation of gene and/or transcript expression, and RPKM (Reads Per Kilobase per Million reads) is widely used to represent the relative abundance of mRNAs for a gene. In general, the methods for gene quantification can be largely divided into two categories: transcript-based approach and ‘union exon’-based approach. Transcript-based approach is intrinsically more difficult because different isoforms of the gene typically have a high proportion of genomic overlap. On the other hand, ‘union exon’-based approach method is much simpler and thus widely used in RNA-seq gene quantification. Biologically, a gene is expressed in one or more transcript isoforms. Therefore, transcript-based approach is logistically more meaningful than ‘union exon’-based approach. Despite the fact that gene quantification is a fundamental task in most RNA-seq studies, however, it remains unclear whether ‘union exon’-based approach for RNA-seq gene quantification is a good practice or not. In this paper, we carried out a side-by-side comparison of ‘union exon’-based approach and transcript-based method in RNA-seq gene quantification. It was found that the gene expression levels are significantly underestimated by ‘union exon’-based approach, and the average of RPKM from ‘union exons’-based method is less than 50% of the mean expression obtained from transcript-based approach. The difference between the two approaches is primarily affected by the number of transcripts in a gene. We performed differential analysis at both gene and transcript levels, respectively, and found more insights, such as isoform switches, are gained from isoform differential analysis. The accuracy of isoform quantification would improve if the read coverage pattern and exon-exon spanning reads are taken into account and incorporated into EM (Expectation Maximization) algorithm. Our investigation discourages the use of ‘union exons’-based approach in gene quantification despite its simplicity.
Collapse
Affiliation(s)
- Shanrong Zhao
- Clinical Genetics and Bioinformatics, Pfizer Worldwide Research & Development, Cambridge, Massachusetts, 02139, United States of America
- * E-mail: ;
| | - Li Xi
- Clinical Genetics and Bioinformatics, Pfizer Worldwide Research & Development, Cambridge, Massachusetts, 02139, United States of America
| | - Baohong Zhang
- Clinical Genetics and Bioinformatics, Pfizer Worldwide Research & Development, Cambridge, Massachusetts, 02139, United States of America
| |
Collapse
|
49
|
AlFadhli S, Ghanem AAM, Nizam R. Genome-wide peripheral blood transcriptome analysis of Arab female lupus and lupus nephritis. Gene 2015; 570:230-8. [PMID: 26072163 DOI: 10.1016/j.gene.2015.06.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Revised: 05/16/2015] [Accepted: 06/07/2015] [Indexed: 01/11/2023]
Abstract
Systemic lupus erythematosus (lupus) is a genetically heterogeneous autoimmune disorder with an obscure etiology. With 92-94% of human genes exhibiting alternative splicing, gaining insights to such events may lead to better diagnostics. Herein, we explored the genome-wide peripheral blood transcriptome of lupus and its severe form lupus-nephritis (LN) compared to healthy controls (HC). Age/gender/ethnically-matched Arab females were tested using high-density arrays and statistical analysis was carried out using appropriate software. Analysis revealed 15 splice variants that are differentially expressed between lupus/HC and 99 variants between LN/HC (p ≤ 0.05, SI> or ≤ 0.5, Benjamin Hochberg-False discovery rate correction). Comparison between LN/lupus revealed 7 variants that significantly differed in expression. Pathway analysis of differentially spliced-genes postulated 11 significant pathways in lupus and 12 in LN (p<0.05). Analysis of peripheral blood transcriptome possibly revealed signature causative genes that are alternatively spliced, signifying their clinical relevance. Present study is the first to reveal the significance of alternative variants in lupus and LN.
Collapse
Affiliation(s)
- Suad AlFadhli
- Department of Medical Laboratory Sciences, Faculty of Allied Health Sciences, Kuwait University, Kuwait.
| | | | - Rasheeba Nizam
- Department of Medical Laboratory Sciences, Faculty of Allied Health Sciences, Kuwait University, Kuwait
| |
Collapse
|
50
|
Li HD, Menon R, Govindarajoo B, Panwar B, Zhang Y, Omenn GS, Guan Y. Functional Networks of Highest-Connected Splice Isoforms: From The Chromosome 17 Human Proteome Project. J Proteome Res 2015. [PMID: 26216192 DOI: 10.1021/acs.jproteome.5b00494] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Alternative splicing allows a single gene to produce multiple transcript-level splice isoforms from which the translated proteins may show differences in their expression and function. Identifying the major functional or canonical isoform is important for understanding gene and protein functions. Identification and characterization of splice isoforms is a stated goal of the HUPO Human Proteome Project and of neXtProt. Multiple efforts have catalogued splice isoforms as "dominant", "principal", or "major" isoforms based on expression or evolutionary traits. In contrast, we recently proposed highest connected isoforms (HCIs) as a new class of canonical isoforms that have the strongest interactions in a functional network and revealed their significantly higher (differential) transcript-level expression compared to nonhighest connected isoforms (NCIs) regardless of tissues/cell lines in the mouse. HCIs and their expression behavior in the human remain unexplored. Here we identified HCIs for 6157 multi-isoform genes using a human isoform network that we constructed by integrating a large compendium of heterogeneous genomic data. We present examples for pairs of transcript isoforms of ABCC3, RBM34, ERBB2, and ANXA7. We found that functional networks of isoforms of the same gene can show large differences. Interestingly, differential expression between HCIs and NCIs was also observed in the human on an independent set of 940 RNA-seq samples across multiple tissues, including heart, kidney, and liver. Using proteomic data from normal human retina and placenta, we showed that HCIs are a promising indicator of expressed protein isoforms exemplified by NUDFB6 and M6PR. Furthermore, we found that a significant percentage (20%, p = 0.0003) of human and mouse HCIs are homologues, suggesting their conservation between species. Our identified HCIs expand the repertoire of canonical isoforms and are expected to facilitate studying main protein products, understanding gene regulation, and possibly evolution. The network is available through our web server as a rich resource for investigating isoform functional relationships (http://guanlab.ccmb.med.umich.edu/hisonet). All MS/MS data were available at ProteomeXchange Web site (http://www.proteomexchange.org) through their identifiers (retina: PXD001242, placenta: PXD000754).
Collapse
Affiliation(s)
- Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Bharat Panwar
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|