1
|
Liu Y, Li HD, Wang J. CrossIsoFun: predicting isoform functions using the integration of multi-omics data. Bioinformatics 2024; 41:btae742. [PMID: 39680906 PMCID: PMC11706537 DOI: 10.1093/bioinformatics/btae742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 11/16/2024] [Accepted: 12/13/2024] [Indexed: 12/18/2024] Open
Abstract
MOTIVATION Isoforms spliced from the same gene may carry distinct biological functions. Therefore, annotating functions at the isoform level provides valuable insights into the functional diversity of genomes. Since experimental approaches for determining isoform functions are time- and cost-demanding, computational methods have been proposed. In this case, multi-omics data integration helps enhance the model performance, providing complementary insights for isoform functions. However, current methods underperform in leveraging diverse omics data, primarily due to the limited power to integrate the heterogeneous feature domains. Besides, among the multi-omics data, isoform-isoform interactions (IIIs) are a key data source, as isoforms interact with each other to perform functions. Unfortunately, IIIs remain largely underutilized in isoform function predictions until now. RESULTS We introduce CrossIsoFun, a multi-omics data analysis framework for isoform function prediction. CrossIsoFun combines omics-specific and cross-omics learning for data integration and function prediction. In detail, CrossIsoFun uses a graph convolutional network (GCN) as the omics-specific classifier for each data source. The initial label predictions from GCNs are forwarded to the View Correlation Discovery Network (VCDN) and processed as a cross-omics integrative representation. The representation is then used to produce final predictions of isoform functions. In addition, an antoencoder within a cycle-consistency generative adversarial network (cycleGAN) is designed to generate IIIs from PPIs and thereby enrich the interactomics data. Our method outperforms the state-of-the-art methods on three tissue-naive datasets and 15 tissue-specific datasets with mRNA expression, sequence, and PPI data. The prediction of CrossIsoFun is further validated by its consistency with subcellular localization and isoform-level annotations with literature support. AVAILABILITY AND IMPLEMENTATION CrossIsoFun is freely available at https://github.com/genemine/CrossIsoFun.
Collapse
Affiliation(s)
- Yiwei Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
2
|
Santucci K, Cheng Y, Xu SM, Janitz M. Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches. Brief Funct Genomics 2024; 23:683-694. [PMID: 39158328 DOI: 10.1093/bfgp/elae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/29/2024] [Accepted: 07/31/2024] [Indexed: 08/20/2024] Open
Abstract
Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.
Collapse
Affiliation(s)
- Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
3
|
Sun Y, Pang Y, Wu X, Zhu R, Wang L, Tian M, He X, Liu D, Yang X. Landscape of alternative splicing and polyadenylation during growth and development of muscles in pigs. Commun Biol 2024; 7:1607. [PMID: 39627472 PMCID: PMC11614907 DOI: 10.1038/s42003-024-07332-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 11/28/2024] [Indexed: 12/06/2024] Open
Abstract
Alternative polyadenylation (APA) is emerging as a post-transcriptional regulatory mechanism, similar as that of alternative splicing (AS), and plays a prominent role in regulating gene expression and increasing the complexity of the transcriptome and proteome. We use polyadenylation selected long-read isoform sequencing to obtain full-length transcript sequences in porcine muscles at five developmental stages. We identify numerous novel transcripts unannotated in the existing pig genome, including transcripts mapping to known and unknown gene loci, and widespread transcript diversity in porcine muscles. The top 100 most isoformic genes are mainly enriched in Gene Ontology terms related to muscle growth and development. It is revealed that intron retention/exon inclusion and the usage of distal polyadenylation site (PAS) are associated with ageing through analyzing changes of AS and PAS during muscle development. We also identify developmental changes in major transcripts and major PASs. Furthermore, genes/transcripts important for muscle development are identified. The results confirm the importance of AS and APA in pig muscles, substantially increasing transcriptional diversity and showing an important mechanism underlying gene regulation in muscles.
Collapse
Affiliation(s)
- Yuanlu Sun
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, China
| | - Yu Pang
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, China
| | - Xiaoxu Wu
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, China
| | - Rongru Zhu
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, China
| | - Liang Wang
- Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Sciences, Harbin, 150086, China
| | - Ming Tian
- Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Sciences, Harbin, 150086, China
| | - Xinmiao He
- Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Sciences, Harbin, 150086, China
| | - Di Liu
- Institute of Animal Husbandry, Heilongjiang Academy of Agricultural Sciences, Harbin, 150086, China.
| | - Xiuqin Yang
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, China.
| |
Collapse
|
4
|
Kiseleva OI, Arzumanian VA, Kurbatov IY, Poverennaya EV. In silico and in cellulo approaches for functional annotation of human protein splice variants. BIOMEDITSINSKAIA KHIMIIA 2024; 70:315-328. [PMID: 39324196 DOI: 10.18097/pbmc20247005315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
The elegance of pre-mRNA splicing mechanisms continues to interest scientists even after over a half century, since the discovery of the fact that coding regions in genes are interrupted by non-coding sequences. The vast majority of human genes have several mRNA variants, coding structurally and functionally different protein isoforms in a tissue-specific manner and with a linkage to specific developmental stages of the organism. Alteration of splicing patterns shifts the balance of functionally distinct proteins in living systems, distorts normal molecular pathways, and may trigger the onset and progression of various pathologies. Over the past two decades, numerous studies have been conducted in various life sciences disciplines to deepen our understanding of splicing mechanisms and the extent of their impact on the functioning of living systems. This review aims to summarize experimental and computational approaches used to elucidate the functions of splice variants of a single gene based on our experience accumulated in the laboratory of interactomics of proteoforms at the Institute of Biomedical Chemistry (IBMC) and best global practices.
Collapse
Affiliation(s)
- O I Kiseleva
- Institute of Biomedical Chemistry, Moscow, Russia
| | | | | | | |
Collapse
|
5
|
Abulfaraj AA, Alshareef SA. Concordant Gene Expression and Alternative Splicing Regulation under Abiotic Stresses in Arabidopsis. Genes (Basel) 2024; 15:675. [PMID: 38927612 PMCID: PMC11202685 DOI: 10.3390/genes15060675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/19/2024] [Accepted: 05/20/2024] [Indexed: 06/28/2024] Open
Abstract
The current investigation endeavors to identify differentially expressed alternatively spliced (DAS) genes that exhibit concordant expression with splicing factors (SFs) under diverse multifactorial abiotic stress combinations in Arabidopsis seedlings. SFs serve as the post-transcriptional mechanism governing the spatiotemporal dynamics of gene expression. The different stresses encompass variations in salt concentration, heat, intensive light, and their combinations. Clusters demonstrating consistent expression profiles were surveyed to pinpoint DAS/SF gene pairs exhibiting concordant expression. Through rigorous selection criteria, which incorporate alignment with documented gene functionalities and expression patterns observed in this study, four members of the serine/arginine-rich (SR) gene family were delineated as SFs concordantly expressed with six DAS genes. These regulated SF genes encompass cactin, SR1-like, SR30, and SC35-like. The identified concordantly expressed DAS genes encode diverse proteins such as the 26.5 kDa heat shock protein, chaperone protein DnaJ, potassium channel GORK, calcium-binding EF hand family protein, DEAD-box RNA helicase, and 1-aminocyclopropane-1-carboxylate synthase 6. Among the concordantly expressed DAS/SF gene pairs, SR30/DEAD-box RNA helicase, and SC35-like/1-aminocyclopropane-1-carboxylate synthase 6 emerge as promising candidates, necessitating further examinations to ascertain whether these SFs orchestrate splicing of the respective DAS genes. This study contributes to a deeper comprehension of the varied responses of the splicing machinery to abiotic stresses. Leveraging these DAS/SF associations shows promise for elucidating avenues for augmenting breeding programs aimed at fortifying cultivated plants against heat and intensive light stresses.
Collapse
Affiliation(s)
- Aala A. Abulfaraj
- Biological Sciences Department, College of Science & Arts, King Abdulaziz University, Rabigh 21911, Saudi Arabia
| | - Sahar A. Alshareef
- Department of Biology, College of Science and Arts at Khulis, University of Jeddah, Jeddah 21921, Saudi Arabia;
| |
Collapse
|
6
|
Liu Y, Yang C, Li HD, Wang J. IsoFrog: a reversible jump Markov Chain Monte Carlo feature selection-based method for predicting isoform functions. Bioinformatics 2023; 39:btad530. [PMID: 37647643 PMCID: PMC10491952 DOI: 10.1093/bioinformatics/btad530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 07/21/2023] [Accepted: 08/29/2023] [Indexed: 09/01/2023] Open
Abstract
MOTIVATION A single gene may yield several isoforms with different functions through alternative splicing. Continuous efforts are devoted to developing machine-learning methods to predict isoform functions. However, existing methods do not consider the relevance of each feature to specific functions and ignore the noise caused by the irrelevant features. In this case, we hypothesize that constructing a feature selection framework to extract the function-relevant features might help improve the model accuracy in isoform function prediction. RESULTS In this article, we present a feature selection-based approach named IsoFrog to predict isoform functions. First, IsoFrog adopts a reversible jump Markov Chain Monte Carlo (RJMCMC)-based feature selection framework to assess the feature importance to gene functions. Second, a sequential feature selection procedure is applied to select a subset of function-relevant features. This strategy screens the relevant features for the specific function while eliminating irrelevant ones, improving the effectiveness of the input features. Then, the selected features are input into our proposed method modified domain-invariant partial least squares, which prioritizes the most likely positive isoform for each positive MIG and utilizes diPLS for isoform function prediction. Tested on three datasets, our method achieves superior performance over six state-of-the-art methods, and the RJMCMC-based feature selection framework outperforms three classic feature selection methods. We expect this proposed methodology will promote the identification of isoform functions and further inspire the development of new methods. AVAILABILITY AND IMPLEMENTATION IsoFrog is freely available at https://github.com/genemine/IsoFrog.
Collapse
Affiliation(s)
- Yiwei Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Changhuo Yang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
7
|
Papastergiou T, Azé J, Bringay S, Louet M, Poncelet P, Rosales-Hurtado M, Vo-Hoang Y, Licznar-Fajardo P, Docquier JD, Gavara L. Discovering NDM-1 inhibitors using molecular substructure embeddings representations. J Integr Bioinform 2023; 0:jib-2022-0050. [PMID: 37498676 PMCID: PMC10389050 DOI: 10.1515/jib-2022-0050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/12/2023] [Indexed: 07/29/2023] Open
Abstract
NDM-1 (New-Delhi-Metallo-β-lactamase-1) is an enzyme developed by bacteria that is implicated in bacteria resistance to almost all known antibiotics. In this study, we deliver a new, curated NDM-1 bioactivities database, along with a set of unifying rules for managing different activity properties and inconsistencies. We define the activity classification problem in terms of Multiple Instance Learning, employing embeddings corresponding to molecular substructures and present an ensemble ranking and classification framework, relaying on a k-fold Cross Validation method employing a per fold hyper-parameter optimization procedure, showing promising generalization ability. The MIL paradigm displayed an improvement up to 45.7 %, in terms of Balanced Accuracy, in comparison to the classical Machine Learning paradigm. Moreover, we investigate different compact molecular representations, based on atomic or bi-atomic substructures. Finally, we scanned the Drugbank for strongly active compounds and we present the top-15 ranked compounds.
Collapse
Affiliation(s)
- Thomas Papastergiou
- LIRMM, University of Montpellier, CNRS, 34095 Montpellier, France
- IBMM, CNRS, University of Montpellier, ENSCM, 34293 Montpellier, France
| | - Jérôme Azé
- LIRMM, University of Montpellier, CNRS, 34095 Montpellier, France
| | - Sandra Bringay
- LIRMM, University of Montpellier, CNRS, 34095 Montpellier, France
- AMIS, Paul Valery University, 34199 Montpellier, France
| | - Maxime Louet
- IBMM, CNRS, University of Montpellier, ENSCM, 34293 Montpellier, France
| | - Pascal Poncelet
- LIRMM, University of Montpellier, CNRS, 34095 Montpellier, France
| | | | - Yen Vo-Hoang
- IBMM, CNRS, University of Montpellier, ENSCM, 34293 Montpellier, France
| | | | - Jean-Denis Docquier
- Department of Medical Biotechnologies, University of Siena, I-53100 Siena, Italy
| | - Laurent Gavara
- IBMM, CNRS, University of Montpellier, ENSCM, 34293 Montpellier, France
| |
Collapse
|
8
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
9
|
Qiu S, Yu G, Lu X, Domeniconi C, Guo M. Isoform function prediction by Gene Ontology embedding. Bioinformatics 2022; 38:4581-4588. [PMID: 35997558 DOI: 10.1093/bioinformatics/btac576] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/13/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION High-resolution annotation of gene functions is a central task in functional genomics. Multiple proteoforms translated from alternatively spliced isoforms from a single gene are actual function performers and greatly increase the functional diversity. The specific functions of different isoforms can decipher the molecular basis of various complex diseases at a finer granularity. Multi-instance learning (MIL)-based solutions have been developed to distribute gene(bag)-level Gene Ontology (GO) annotations to isoforms(instances), but they simply presume that a particular annotation of the gene is responsible by only one isoform, neglect the hierarchical structures and semantics of massive GO terms (labels), or can only handle dozens of terms. RESULTS We propose an efficacy approach IsofunGO to differentiate massive functions of isoforms by GO embedding. Particularly, IsofunGO first introduces an attributed hierarchical network to model massive GO terms, and a GO network embedding strategy to learn compact representations of GO terms and project GO annotations of genes into compressed ones, this strategy not only explores and preserves hierarchy between GO terms but also greatly reduces the prediction load. Next, it develops an attention-based MIL network to fuse genomics and transcriptomics data of isoforms and predict isoform functions by referring to compressed annotations. Extensive experiments on benchmark datasets demonstrate the efficacy of IsofunGO. Both the GO embedding and attention mechanism can boost the performance and interpretability. AVAILABILITYAND IMPLEMENTATION The code of IsofunGO is available at http://www.sdu-idea.cn/codes.php?name=IsofunGO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sichao Qiu
- School of Software, Shandong University, Jinan, Shandong 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong 250101, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan, Shandong 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong 250101, China
| | - Xudong Lu
- School of Software, Shandong University, Jinan, Shandong 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, Shandong 250101, China
| | | | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
| |
Collapse
|
10
|
Yu G, Huang Q, Zhang X, Guo M, Wang J. Tissue Specificity Based Isoform Function Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3048-3059. [PMID: 34185647 DOI: 10.1109/tcbb.2021.3093167] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Alternative splicing enables a gene spliced into different isoforms and hence protein variants. Identifying individual functions of these isoforms help deciphering the functional diversity of proteins. Although much efforts have been made for automatic gene function prediction, few efforts have been moved toward computational isoform function prediction, mainly due to the unavailable (or scanty) functional annotations of isoforms. Existing efforts directly combine multiple RNA-seq datasets without account of the important tissue specificity of alternative splicing. To bridge this gap, we introduce a novel approach called TS-Isofun to predict the functions of isoforms by integrating multiple functional association networks with respect to tissue specificity. TS-Isofun first constructs tissue-specific isoform functional association networks using multiple RNA-seq datasets from tissue-wise. Next, TS-Isofun assigns weights to these networks and models the tissue specificity by selectively integrating them with adaptive weights. It then introduces a joint matrix factorization-based data fusion model to leverage the integrated network, gene-level data and functional annotations of genes to infer the functions of isoforms. To achieve coherent weight assignment and isoform function prediction, TS-Isofun jointly optimizes the weights of individual networks and the isoform function prediction in a unified objective function. Experimental results show that TS-Isofun significantly outperforms state-of-the-art methods and the account of tissue specificity contributes to more accurate isoform function prediction.
Collapse
|
11
|
Wang J, Zhang L, Zeng A, Xia D, Yu J, Yu G. DeepIII: Predicting Isoform-Isoform Interactions by Deep Neural Networks and Data Fusion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2177-2187. [PMID: 33764878 DOI: 10.1109/tcbb.2021.3068875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Alternative splicing enables a gene translating into different isoforms and into the corresponding proteoforms, which actually accomplish various biological functions of a living body. Isoform-isoform interactions (IIIs) provide a higher resolution interactome to explore the cellular processes and disease mechanisms than the canonically studied protein-protein interactions (PPIs), which are often recorded at the coarse gene level. The knowledge of IIIs is critical to map pathways, understand protein complexity and functional diversity, but the known IIIs are very scanty. In this paper, we propose a deep learning based method called DeepIII to systematically predict genome-wide IIIs by integrating diverse data sources, including RNA-seq datasets of different human tissues, exon array data, domain-domain interactions (DDIs) of proteins, nucleotide sequences and amino acid sequences. Particularly, DeepIII fuses these data to learn the representation of isoform pairs with a four-layer deep neural networks, and then performs binary classification on the learnt representation to achieve the prediction of IIIs. Experimental results show that DeepIII achieves a superior prediction performance to the state-of-the-art solutions and the III network constructed by DeepIII gives more accurate isoform function prediction. Case studies further confirm that DeepIII can differentiate the individual interaction partners of different isoforms spliced from the same gene. The code and datasets of DeepIII are available at http://mlda.swu.edu.cn/codes.php?name=DeepIII.
Collapse
|
12
|
Yu G, Yang Y, Yan Y, Guo M, Zhang X, Wang J. DeepIDA: Predicting Isoform-Disease Associations by Data Fusion and Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2166-2176. [PMID: 33571094 DOI: 10.1109/tcbb.2021.3058801] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Alternative splicing produces different isoforms from the same gene locus, it is an important mechanism for regulating gene expression and proteome diversity. Although the prediction of gene(ncRNA)-disease associations has been extensively studied, few (or no) computational solutions have been proposed for the prediction of isoform-disease association (IDA) at a large scale, mainly due to the lack of disease annotations of isoforms. However, increasing evidences confirm the associations between diseases and isoforms, which can more precisely uncover the pathology of complex diseases. Therefore, it is highly desirable to predict IDAs. To bridge this gap, we propose a deep neural network based solution (DeepIDA) to fuse multi-type genomics and transcriptomics data to predict IDAs. Particularly, DeepIDA uses gene-isoform relations to dispatch gene-disease associations to isoforms. In addition, it utilizes two DNN sub-networks with different structures to capture nucleotide and expression features of isoforms, Gene Ontology data and miRNA target data, respectively. After that, these two sub-networks are merged in a dense layer to predict IDAs. The experimental results on public datasets show that DeepIDA can effectively predict IDAs with AUPRC (area under the precision-recall curve) of 0.9141, macro F-measure of 0.9155, G-mean of 0.9278 and balanced accuracy of 0.9303 across 732 diseases, which are much higher than those of competitive methods. Further study on sixteen isoform-disease association cases again corroborates the superiority of DeepIDA. The code of DeepIDA is available at http://mlda.swu.edu.cn/codes.php?name=DeepIDA.
Collapse
|
13
|
Withanage MHH, Liang H, Zeng E. RNA-Seq Experiment and Data Analysis. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2418:405-424. [PMID: 35119677 DOI: 10.1007/978-1-0716-1920-9_22] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
With the ability to obtain several millions of reads per sample, high-throughput RNA sequencing (RNA-Seq) enables investigation of any transcriptome at a fine resolution. Not just the messenger RNA (mRNA), but a wide variety of different RNA populations (e.g., total RNA, microRNA, long ncRNA, pre-mRNA) can also be investigated using RNA-Seq. While facilitating accurate quantification of gene expression, RNA-Seq offers the opportunity to estimate abundance of isoforms and find novel transcripts and allele-specific transcripts. In this chapter, we describe a protocol to construct an RNA-Seq library for sequencing on Illumina NGS platforms and a computational pipeline to perform RNA-Seq data analysis. The protocols described in this chapter can be applied to the analysis of differential gene expression in control versus 17β-estradiol treatment of in vivo or in vitro systems.
Collapse
Affiliation(s)
| | - Hanquan Liang
- McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Erliang Zeng
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Preventive & Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA. .,Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
14
|
Leung SK, Jeffries AR, Castanho I, Jordan BT, Moore K, Davies JP, Dempster EL, Bray NJ, O'Neill P, Tseng E, Ahmed Z, Collier DA, Jeffery ED, Prabhakar S, Schalkwyk L, Jops C, Gandal MJ, Sheynkman GM, Hannon E, Mill J. Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing. Cell Rep 2021; 37:110022. [PMID: 34788620 PMCID: PMC8609283 DOI: 10.1016/j.celrep.2021.110022] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 07/30/2021] [Accepted: 10/28/2021] [Indexed: 12/05/2022] Open
Abstract
Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community. There is widespread transcript diversity in the cortex and many novel transcripts Some genes display big differences in isoform number between human and mouse cortex There is evidence of differential transcript usage between human fetal and adult cortex There are many novel isoforms of genes associated with human brain disease
Collapse
Key Words
- isoform, transcript, expression, brain, cortex, mouse, human, adult, fetal, long-read sequencing, alternative splicing
Collapse
Affiliation(s)
| | | | - Isabel Castanho
- University of Exeter, Exeter, UK; Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Ben T Jordan
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | | | | | | | | | | | | | | | | | - Erin D Jeffery
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Shyam Prabhakar
- Genome Institute of Singapore, Agency for Science, Technology and Research (A(∗)STAR), Singapore, Singapore
| | | | - Connor Jops
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Michael J Gandal
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Gloria M Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; UVA Cancer Center, University of Virginia, Charlottesville, VA, USA
| | | | | |
Collapse
|
15
|
Jacobs A, Elmer KR. Alternative splicing and gene expression play contrasting roles in the parallel phenotypic evolution of a salmonid fish. Mol Ecol 2021; 30:4955-4969. [PMID: 33502030 PMCID: PMC8653899 DOI: 10.1111/mec.15817] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 01/06/2021] [Accepted: 01/18/2021] [Indexed: 12/25/2022]
Abstract
Understanding the contribution of different molecular processes to evolution and development is crucial for identifying the mechanisms of adaptation. Here, we used RNA-sequencing data to test the importance of alternative splicing and differential gene expression in a case of parallel adaptive evolution, the replicated postglacial divergence of the salmonid fish Arctic charr (Salvelinus alpinus) into sympatric benthic and pelagic ecotypes across multiple independent lakes. We found that genes differentially spliced between ecotypes were mostly not differentially expressed (<6% overlap) and were involved in different biological processes. Differentially spliced genes were primarily enriched for muscle development and functioning, while differentially expressed genes were involved in metabolism, immunity and growth. Furthermore, alternative splicing and gene expression were mostly controlled by independent cis-regulatory quantitative trait loci (<3.4% overlap). Cis-regulatory regions were associated with the parallel divergence in splicing (16.5% of intron clusters) and expression (6.7%-10.1% of differentially expressed genes), indicating shared regulatory variation across ecotype pairs. Contrary to theoretical expectation, we found that differentially spliced genes tended to be highly central in regulatory networks ("hub genes") and were annotated to significantly more gene ontology terms compared to nondifferentially spliced genes, consistent with a higher level of pleiotropy. Together, our results suggest that the concerted regulation of alternative splicing and differential gene expression through different regulatory regions leads to the divergence of complementary processes important for local adaptation. This provides novel insights into the importance of contrasting but putatively complementary molecular processes in rapid parallel adaptive evolution.
Collapse
Affiliation(s)
- Arne Jacobs
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary & Life SciencesUniversity of GlasgowGlasgowUK
- Department of Natural ResourcesCornell UniversityIthacaNYUSA
| | - Kathryn R. Elmer
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary & Life SciencesUniversity of GlasgowGlasgowUK
| |
Collapse
|
16
|
Yu G, Zhou G, Zhang X, Domeniconi C, Guo M. DMIL-IsoFun: predicting isoform function using deep multi-instance learning. Bioinformatics 2021; 37:4818-4825. [PMID: 34282449 DOI: 10.1093/bioinformatics/btab532] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 06/20/2021] [Accepted: 07/16/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Alternative splicing creates the considerable proteomic diversity and complexity on relatively limited genome. Proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions of this gene, which reflect the functional knowledge of genes at a finer granular level. Recently, some computational approaches have been proposed to differentiate isoform functions using sequence and expression data. However, their performance is far from being desirable, mainly due to the imbalance and lack of annotations at isoform-level, and the difficulty of modeling gene-isoform relations. RESULT We propose a deep multi-instance learning based framework (DMIL-IsoFun) to differentiate the functions of isoforms. DMIL-IsoFun firstly introduces a multi-instance learning convolution neural network trained with isoform sequences and gene-level annotations to extract the feature vectors and initialize the annotations of isoforms, and then uses a class-imbalance Graph Convolution Network to refine the annotations of individual isoforms based on the isoform co-expression network and extracted features. Extensive experimental results show that DMIL-IsoFun improves the Smin and Fmax of state-of-the-art solutions by at least 29.6% and 40.8%. The effectiveness of DMIL-IsoFun is further confirmed on a testbed of human multiple-isoform genes, and Maize isoforms related with photosynthesis. AVAILABILITY The code and data are available at http://www.sdu-idea.cn/codes.php?name=DMIL-Isofun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guoxian Yu
- School of Software, Shandong University, Jinan, 250101, China.,College of Computer and Information Sciences, Southwest University, Chongqing, 400715, China.,Computer, Electrical, and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, SA
| | - Guangjie Zhou
- School of Software, Shandong University, Jinan, 250101, China.,College of Computer and Information Sciences, Southwest University, Chongqing, 400715, China
| | - Xiangliang Zhang
- Computer, Electrical, and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, SA
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Fairfax, 22030, USA
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
17
|
Li HD, Xu Y, Zhu X, Liu Q, Omenn GS, Wang J. ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets. J Bioinform Comput Biol 2021; 18:2040009. [PMID: 32698720 DOI: 10.1142/s0219720020400090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Clustering analysis of gene expression data is essential for understanding complex biological data, and is widely used in important biological applications such as the identification of cell subpopulations and disease subtypes. In commonly used methods such as hierarchical clustering (HC) and consensus clustering (CC), holistic expression profiles of all genes are often used to assess the similarity between samples for clustering. While these methods have been proven successful in identifying sample clusters in many areas, they do not provide information about which gene sets (functions) contribute most to the clustering, thus limiting the interpretability of the resulting cluster. We hypothesize that integrating prior knowledge of annotated gene sets would not only achieve satisfactory clustering performance but also, more importantly, enable potential biological interpretation of clusters. Here we report ClusterMine, an approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets in functional annotation databases such as Gene Ontology. In addition to the cluster membership of each sample as provided by conventional approaches, it also outputs gene sets that most likely contribute to the clustering, thus facilitating biological interpretation. We compare ClusterMine with conventional approaches on nine real-world experimental datasets that represent different application scenarios in biology. We find that ClusterMine achieves better performances and that the gene sets prioritized by our method are biologically meaningful. ClusterMine is implemented as an R package and is freely available at: www.genemine.org/clustermine.php.
Collapse
Affiliation(s)
- Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Yunpei Xu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Xiaoshu Zhu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China.,School of Computer Science and Engineering, Yulin Normal University, Yulin, Guangxi, P. R. China
| | - Quan Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| | - Gilbert S Omenn
- Departments of Computational Medicine and Bioinformatics, Internal Medicine, Human Genetics and School of Public Health, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 400083, P. R. China
| |
Collapse
|
18
|
Aledavood E, Forte A, Estarellas C, Javier Luque F. Structural basis of the selective activation of enzyme isoforms: Allosteric response to activators of β1- and β2-containing AMPK complexes. Comput Struct Biotechnol J 2021; 19:3394-3406. [PMID: 34194666 PMCID: PMC8217686 DOI: 10.1016/j.csbj.2021.05.056] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/30/2021] [Accepted: 05/30/2021] [Indexed: 12/21/2022] Open
Abstract
AMP-activated protein kinase (AMPK) is a key energy sensor regulating the cell metabolism in response to energy supply and demand. The evolutionary adaptation of AMPK to different tissues is accomplished through the expression of distinct isoforms that can form up to 12 complexes, which exhibit notable differences in the sensitivity to allosteric activators. To shed light into the molecular determinants of the allosteric regulation of this energy sensor, we have examined the structural and dynamical properties of β1- and β2-containing AMPK complexes formed with small molecule activators A-769662 and SC4, and dissected the mechanical response leading to active-like enzyme conformations through the analysis of interaction networks between structural domains. The results reveal the mechanical sensitivity of the α2β1 complex, in contrast with a larger resilience of the α2β2 species, especially regarding modulation by A-769662. Furthermore, binding of activators to α2β1 consistently promotes the pre-organization of the ATP-binding site, favoring the adoption of activated states of the enzyme. These findings are discussed in light of the changes in the residue content of β-subunit isoforms, particularly regarding the β1Asn111 → β2Asp111 substitution as a key factor in modulating the mechanical sensitivity of β1- and β2-containing AMPK complexes. Our studies pave the way for the design of activators tailored for improving the therapeutic treatment of tissue-specific metabolic disorders.
Collapse
Affiliation(s)
| | - Alessia Forte
- Department of Nutrition, Food Science and Gastronomy, Faculty of Pharmacy and Food Sciences, Institute of Biomedicine (IBUB) and Institute of Theoretical and Computational Chemistry (IQTCUB), University of Barcelona, Av. Prat de la Riba 171, Santa Coloma de Gramenet 08921, Spain
| | | | | |
Collapse
|
19
|
Li HD, Yang C, Zhang Z, Yang M, Wu FX, Omenn GS, Wang J. IsoResolve: predicting splice isoform functions by integrating gene and isoform-level features with domain adaptation. Bioinformatics 2021; 37:522-530. [PMID: 32966552 PMCID: PMC8088322 DOI: 10.1093/bioinformatics/btaa829] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/12/2020] [Accepted: 09/09/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION High resolution annotation of gene functions is a central goal in functional genomics. A single gene may produce multiple isoforms with different functions through alternative splicing. Conventional approaches, however, consider a gene as a single entity without differentiating these functionally different isoforms. Towards understanding gene functions at higher resolution, recent efforts have focused on predicting the functions of isoforms. However, the performance of existing methods is far from satisfactory mainly because of the lack of isoform-level functional annotation. RESULTS We present IsoResolve, a novel approach for isoform function prediction, which leverages the information from gene function prediction models with domain adaptation (DA). IsoResolve treats gene-level and isoform-level features as source and target domains, respectively. It uses DA to project the two domains into a latent variable space in such a way that the latent variables from the two domains have similar distribution, which enables the gene domain information to be leveraged for isoform function prediction. We systematically evaluated the performance of IsoResolve in predicting functions. Compared with five state-of-the-art methods, IsoResolve achieved significantly better performance. IsoResolve was further validated by case studies of genes with isoform-level functional annotation. AVAILABILITY AND IMPLEMENTATION IsoResolve is freely available at https://github.com/genemine/IsoResolve. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering
| | - Changhuo Yang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, Hunan 410083, China
| | - Mengyun Yang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N5A9, Canada
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, WA 98101, USA.,Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering
| |
Collapse
|
20
|
Yu G, Zeng J, Wang J, Zhang H, Zhang X, Guo M. Imbalance deep multi‐instance learning for predicting isoform–isoform interactions. INT J INTELL SYST 2021. [DOI: 10.1002/int.22402] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Guoxian Yu
- School of Software Shandong University Jinan China
- College of Computer and Information Science Southwest University Chongqing China
- Joint SDU‐NTU Centre for Artificial Intelligence Research Shandong University Jinan China
| | - Jie Zeng
- College of Computer and Information Science Southwest University Chongqing China
| | - Jun Wang
- College of Computer and Information Science Southwest University Chongqing China
- Joint SDU‐NTU Centre for Artificial Intelligence Research Shandong University Jinan China
| | - Hong Zhang
- College of Computer and Information Science Southwest University Chongqing China
| | - Xiangliang Zhang
- CEMSE King Abdullah University of Science and Technology Thuwal Saudi Arabia
| | - Maozu Guo
- School of Electrical and Information Engineering Beijing University of Civil Engineering and Architecture Beijing China
| |
Collapse
|
21
|
Chen H, Shaw D, Bu D, Jiang T. FINER: enhancing the prediction of tissue-specific functions of isoforms by refining isoform interaction networks. NAR Genom Bioinform 2021; 3:lqab057. [PMID: 34169280 PMCID: PMC8219044 DOI: 10.1093/nargab/lqab057] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 05/18/2021] [Accepted: 06/03/2021] [Indexed: 12/24/2022] Open
Abstract
Annotating the functions of gene products is a mainstay in biology. A variety of databases have been established to record functional knowledge at the gene level. However, functional annotations at the isoform resolution are in great demand in many biological applications. Although critical information in biological processes such as protein-protein interactions (PPIs) is often used to study gene functions, it does not directly help differentiate the functions of isoforms, as the 'proteins' in the existing PPIs generally refer to 'genes'. On the other hand, the prediction of isoform functions and prediction of isoform-isoform interactions, though inherently intertwined, have so far been treated as independent computational problems in the literature. Here, we present FINER, a unified framework to jointly predict isoform functions and refine PPIs from the gene level to the isoform level, enabling both tasks to benefit from each other. Extensive computational experiments on human tissue-specific data demonstrate that FINER is able to gain at least 5.16% in AUC and 15.1% in AUPRC for functional prediction across multiple tissues by refining noisy PPIs, resulting in significant improvement over the state-of-the-art methods. Some in-depth analyses reveal consistency between FINER's predictions and the tissue specificity as well as subcellular localization of isoforms.
Collapse
Affiliation(s)
- Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | - Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | - Dongbo Bu
- Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
22
|
Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML. Assessing the functional relevance of splice isoforms. NAR Genom Bioinform 2021; 3:lqab044. [PMID: 34046593 PMCID: PMC8140736 DOI: 10.1093/nargab/lqab044] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 04/22/2021] [Accepted: 05/17/2021] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. There is only limited evidence for alternative proteins in proteomics analyses and data from population genetic variation studies indicate that most alternative exons are evolving neutrally. Determining which transcripts produce biologically important isoforms is key to understanding isoform function and to interpreting the real impact of somatic mutations and germline variations. Here we have developed a method, TRIFID, to classify the functional importance of splice isoforms. TRIFID was trained on isoforms detected in large-scale proteomics analyses and distinguishes these biologically important splice isoforms with high confidence. Isoforms predicted as functionally important by the algorithm had measurable cross species conservation and significantly fewer broken functional domains. Additionally, exons that code for these functionally important protein isoforms are under purifying selection, while exons from low scoring transcripts largely appear to be evolving neutrally. TRIFID has been developed for the human genome, but it could in principle be applied to other well-annotated species. We believe that this method will generate valuable insights into the cellular importance of alternative splicing.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Tomas Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
23
|
Li HD, Zhang W, Luo Y, Wang J. IsoDetect: Detection of Splice Isoforms from Third Generation Long Reads Based on Short Feature Sequences. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200316101205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Transcriptome annotation is the basis for understanding gene structures
and analysing gene expression. The transcriptome annotation of many organisms such as humans
is far from incomplete, due partly to the challenge in the identification of isoforms that are
produced from the same gene through alternative splicing. Third generation sequencing (TGS)
reads provide unprecedented opportunity for detecting isoforms due to their long length that
exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection
methods is that they are exclusively based on sequence reads, without incorporating the sequence
information of annotated isoforms.
Objective:
We aim to develop a method to detect isoforms by incorporating annotated isoforms.
Methods:
Based on annotated isoforms, we propose a splice isoform detection method called
IsoDetect. First, the sequence at exon-exon junctions is extracted from annotated isoforms as
“short feature sequences”, which is used to distinguish splice isoforms. Second, we align these
feature sequences to long reads and partition long reads into groups that contain the same set of
feature sequences, thereby avoiding the pair-wise comparison among the large number of long
reads. Third, clustering and consensus generation are carried out based on sequence similarity. For
the long reads that do not contain any short feature sequence, clustering analysis based on
sequence similarity is performed to identify isoforms. Therefore, our method can detect not only
known but also novel isoforms.
Result:
Tested on two datasets from Calypte anna and Zebra Finch, IsoDetect shows higher speed
and good accuracies compared with four existing methods.
Conclusion:
IsoDetect may become a promising method for isoform detection.
Collapse
Affiliation(s)
- Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Wenjing Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yuwen Luo
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
24
|
Khan AH, Lin A, Wang RT, Bloom JS, Lange K, Smith DJ. Pooled analysis of radiation hybrids identifies loci for growth and drug action in mammalian cells. Genome Res 2020; 30:1458-1467. [PMID: 32878976 PMCID: PMC7605260 DOI: 10.1101/gr.262204.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 08/26/2020] [Indexed: 12/16/2022]
Abstract
Genetic screens in mammalian cells commonly focus on loss-of-function approaches. To evaluate the phenotypic consequences of extra gene copies, we used bulk segregant analysis (BSA) of radiation hybrid (RH) cells. We constructed six pools of RH cells, each consisting of ∼2500 independent clones, and placed the pools under selection in media with or without paclitaxel. Low pass sequencing identified 859 growth loci, 38 paclitaxel loci, 62 interaction loci, and three loci for mitochondrial abundance at genome-wide significance. Resolution was measured as ∼30 kb, close to single-gene. Divergent properties were displayed by the RH-BSA growth genes compared to those from loss-of-function screens, refuting the balance hypothesis. In addition, enhanced retention of human centromeres in the RH pools suggests a new approach to functional dissection of these chromosomal elements. Pooled analysis of RH cells showed high power and resolution and should be a useful addition to the mammalian genetic toolkit.
Collapse
Affiliation(s)
- Arshad H Khan
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA
| | - Andy Lin
- Office of Information Technology, UCLA, Los Angeles, California 90095-1557, USA
| | - Richard T Wang
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Joshua S Bloom
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
- Howard Hughes Medical Institute, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Kenneth Lange
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-7088, USA
| | - Desmond J Smith
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, California 90095-1735, USA
| |
Collapse
|
25
|
Yu G, Wang K, Domeniconi C, Guo M, Wang J. Isoform function prediction based on bi-random walks on a heterogeneous network. Bioinformatics 2020; 36:303-310. [PMID: 31250882 DOI: 10.1093/bioinformatics/btz535] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 06/21/2019] [Accepted: 06/26/2019] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Alternative splicing contributes to the functional diversity of protein species and the proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions. Computationally predicting the functions of genes has been studied for decades. However, how to distinguish the functional annotations of isoforms, whose annotations are essential for understanding developmental abnormalities and cancers, is rarely explored. The main bottleneck is that functional annotations of isoforms are generally unavailable and functional genomic databases universally store the functional annotations at the gene level. RESULTS We propose IsoFun to accomplish Isoform Function prediction based on bi-random walks on a heterogeneous network. IsoFun firstly constructs an isoform functional association network based on the expression profiles of isoforms derived from multiple RNA-seq datasets. Next, IsoFun uses the available Gene Ontology annotations of genes, gene-gene interactions and the relations between genes and isoforms to construct a heterogeneous network. After this, IsoFun performs a tailored bi-random walk on the heterogeneous network to predict the association between GO terms and isoforms, thus accomplishing the prediction of GO annotations of isoforms. Experimental results show that IsoFun significantly outperforms the state-of-the-art algorithms and improves the area under the receiver-operating curve (AUROC) and the area under the precision-recall curve (AUPRC) by 17% and 44% at the gene-level, respectively. We further validated the performance of IsoFun on the genes ADAM15 and BCL2L1. IsoFun accurately differentiates the functions of respective isoforms of these two genes. AVAILABILITY AND IMPLEMENTATION The code of IsoFun is available at http://mlda.swu.edu.cn/codes.php? name=IsoFun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Keyao Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| |
Collapse
|
26
|
Rothzerg E, Ho XD, Xu J, Wood D, Märtson A, Maasalu K, Kõks S. Alternative splicing of leptin receptor overlapping transcript in osteosarcoma. Exp Biol Med (Maywood) 2020; 245:1437-1443. [PMID: 32787464 DOI: 10.1177/1535370220949139] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
IMPACT STATEMENT Osteosarcoma (OS, also known as osteogenic sarcoma) is the most common primary malignancy of bone in children and adolescents. The molecular mechanisms of OS are extremely complicated and its molecular mediators remain to be elucidated. We sequenced total RNA from 18 OS bone samples (paired normal-tumor biopsies). We found statistically significant (FDR <0.05) 26 differentially expressed transcript variants of LEPROT gene with different expressions in normal and tumor samples. These findings contribute to the understanding of molecular mechanisms of OS development and provide encouragement to pursue further research.
Collapse
Affiliation(s)
- Emel Rothzerg
- School of Biomedical Sciences, The University of Western Australia, Perth, WA 6009, Australia.,Perron Institute for Neurological and Translational Science, QEII Medical Centre, Nedlands, WA 6009, Australia
| | - Xuan D Ho
- Department of Oncology, College of Medicine and Pharmacy, Hue University, Hue 53000, Vietnam
| | - Jiake Xu
- School of Biomedical Sciences, The University of Western Australia, Perth, WA 6009, Australia
| | - David Wood
- School of Biomedical Sciences, The University of Western Australia, Perth, WA 6009, Australia
| | - Aare Märtson
- Department of Traumatology and Orthopaedics, University of Tartu, Tartu University Hospital, Tartu 50411, Estonia
| | - Katre Maasalu
- Department of Traumatology and Orthopaedics, University of Tartu, Tartu University Hospital, Tartu 50411, Estonia
| | - Sulev Kõks
- Perron Institute for Neurological and Translational Science, QEII Medical Centre, Nedlands, WA 6009, Australia.,Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Murdoch, WA 6150, Australia
| |
Collapse
|
27
|
Mishra SK, Muthye V, Kandoi G. Computational Methods for Predicting Functions at the mRNA Isoform Level. Int J Mol Sci 2020; 21:ijms21165686. [PMID: 32784445 PMCID: PMC7460821 DOI: 10.3390/ijms21165686] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/05/2020] [Accepted: 08/06/2020] [Indexed: 11/16/2022] Open
Abstract
Multiple mRNA isoforms of the same gene are produced via alternative splicing, a biological mechanism that regulates protein diversity while maintaining genome size. Alternatively spliced mRNA isoforms of the same gene may sometimes have very similar sequence, but they can have significantly diverse effects on cellular function and regulation. The products of alternative splicing have important and diverse functional roles, such as response to environmental stress, regulation of gene expression, human heritable, and plant diseases. The mRNA isoforms of the same gene can have dramatically different functions. Despite the functional importance of mRNA isoforms, very little has been done to annotate their functions. The recent years have however seen the development of several computational methods aimed at predicting mRNA isoform level biological functions. These methods use a wide array of proteo-genomic data to develop machine learning-based mRNA isoform function prediction tools. In this review, we discuss the computational methods developed for predicting the biological function at the individual mRNA isoform level.
Collapse
|
28
|
Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020; 20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open
Abstract
In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Genentech, Inc. 1 DNA Way, Mail Stop: 35-6J, South San Francisco, CA, USA
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Kasavanahalli, Carmelaram P.O., Bengaluru, Karnataka, India
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - Stefan Canzar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| |
Collapse
|
29
|
Shaw D, Chen H, Jiang T. DeepIsoFun: a deep domain adaptation approach to predict isoform functions. Bioinformatics 2020; 35:2535-2544. [PMID: 30535380 DOI: 10.1093/bioinformatics/bty1017] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/07/2018] [Accepted: 12/08/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Isoforms are mRNAs produced from the same gene locus by alternative splicing and may have different functions. Although gene functions have been studied extensively, little is known about the specific functions of isoforms. Recently, some computational approaches based on multiple instance learning have been proposed to predict isoform functions from annotated gene functions and expression data, but their performance is far from being desirable primarily due to the lack of labeled training data. To improve the performance on this problem, we propose a novel deep learning method, DeepIsoFun, that combines multiple instance learning with domain adaptation. The latter technique helps to transfer the knowledge of gene functions to the prediction of isoform functions and provides additional labeled training data. Our model is trained on a deep neural network architecture so that it can adapt to different expression distributions associated with different gene ontology terms. RESULTS We evaluated the performance of DeepIsoFun on three expression datasets of human and mouse collected from SRA studies at different times. On each dataset, DeepIsoFun performed significantly better than the existing methods. In terms of area under the receiver operating characteristics curve, our method acquired at least 26% improvement and in terms of area under the precision-recall curve, it acquired at least 10% improvement over the state-of-the-art methods. In addition, we also study the divergence of the functions predicted by our method for isoforms from the same gene and the overall correlation between expression similarity and the similarity of predicted functions. AVAILABILITY AND IMPLEMENTATION https://github.com/dls03/DeepIsoFun/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA.,Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
30
|
ISOGO: Functional annotation of protein-coding splice variants. Sci Rep 2020; 10:1069. [PMID: 31974522 PMCID: PMC6978412 DOI: 10.1038/s41598-020-57974-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 01/07/2020] [Indexed: 12/25/2022] Open
Abstract
The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes, but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was developed to annotate gene products according to their biological processes, molecular functions and cellular components. Despite a single gene may have several gene products, most annotations are not isoform-specific and do not distinguish the functions of the different proteins originated from a single gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but this has shown to be a daunting task. We have developed ISOGO (ISOform + GO function imputation), a novel algorithm to predict the function of coding isoforms based on their protein domains and their correlation of expression along 11,373 cancer patients. Combining these two sources of information outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) five times larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested ISOGO predictions on some genes with isoform-specific functions (BRCA1, MADD,VAMP7 and ITSN1) and they were coherent with the literature. Besides, we examined whether the main isoform of each gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs in 99.4% of the genes. We also evaluated the predictions for isoform-specific functions provided by the CAFA3 challenge and results were also convincing. To make these results available to the scientific community, we have deployed a web application to consult ISOGO predictions (https://biotecnun.unav.es/app/isogo). Initial data, website link, isoform-specific GO function predictions and R code is available at https://gitlab.com/icassol/isogo.
Collapse
|
31
|
Nibau C, Dadarou D, Kargios N, Mallioura A, Fernandez-Fuentes N, Cavallari N, Doonan JH. A Functional Kinase Is Necessary for Cyclin-Dependent Kinase G1 (CDKG1) to Maintain Fertility at High Ambient Temperature in Arabidopsis. FRONTIERS IN PLANT SCIENCE 2020; 11:586870. [PMID: 33240303 PMCID: PMC7683410 DOI: 10.3389/fpls.2020.586870] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 10/15/2020] [Indexed: 05/15/2023]
Abstract
Maintaining fertility in a fluctuating environment is key to the reproductive success of flowering plants. Meiosis and pollen formation are particularly sensitive to changes in growing conditions, especially temperature. We have previously identified cyclin-dependent kinase G1 (CDKG1) as a master regulator of temperature-dependent meiosis and this may involve the regulation of alternative splicing (AS), including of its own transcript. CDKG1 mRNA can undergo several AS events, potentially producing two protein variants: CDKG1L and CDKG1S, differing in their N-terminal domain which may be involved in co-factor interaction. In leaves, both isoforms have distinct temperature-dependent functions on target mRNA processing, but their role in pollen development is unknown. In the present study, we characterize the role of CDKG1L and CDKG1S in maintaining Arabidopsis fertility. We show that the long (L) form is necessary and sufficient to rescue the fertility defects of the cdkg1-1 mutant, while the short (S) form is unable to rescue fertility. On the other hand, an extra copy of CDKG1L reduces fertility. In addition, mutation of the ATP binding pocket of the kinase indicates that kinase activity is necessary for the function of CDKG1. Kinase mutants of CDKG1L and CDKG1S correctly localize to the cell nucleus and nucleus and cytoplasm, respectively, but are unable to rescue either the fertility or the splicing defects of the cdkg1-1 mutant. Furthermore, we show that there is partial functional overlap between CDKG1 and its paralog CDKG2 that could in part be explained by overlapping gene expression.
Collapse
Affiliation(s)
- Candida Nibau
- Institute of Biological Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, United Kingdom
- *Correspondence: Candida Nibau,
| | - Despoina Dadarou
- Institute of Biological Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, United Kingdom
- School of Life Sciences, University of Warwick, Coventry, United Kingdom
| | - Nestoras Kargios
- Institute of Biological Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, United Kingdom
| | - Areti Mallioura
- Institute of Biological Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, United Kingdom
| | - Narcis Fernandez-Fuentes
- Institute of Biological Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, United Kingdom
| | - Nicola Cavallari
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - John H. Doonan
- Institute of Biological Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, United Kingdom
- John H. Doonan,
| |
Collapse
|
32
|
Isoform-Disease Association Prediction by Data Fusion. BIOINFORMATICS RESEARCH AND APPLICATIONS 2020. [DOI: 10.1007/978-3-030-57821-3_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
33
|
Liu Z, Dong W, Luo W, Jiang W, Li Q, He Z. HLMethy: a machine learning-based model to identify the hidden labels of m 6A candidates. PLANT MOLECULAR BIOLOGY 2019; 101:575-584. [PMID: 31722090 DOI: 10.1007/s11103-019-00930-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 11/01/2019] [Indexed: 06/10/2023]
Abstract
We developed a machine learning-based model to identify the hidden labels of m6A candidates from noisy m6A-seq data. Peak-calling approaches, such as MeRIP-seq or m6A-seq, are commonly used to map m6A modifications. However, these technologies can only map m6A sites with 100-200 nt resolution and cannot reveal the precise location or the number of modified residues in a transcript. To address this challenge, we developed a novel machine learning-based approach, named HLMethy, to assign labels to m6A candidates from noisy m6A-seq data. The multiple instance learning framework was adopted and two different training strategies were used to generate the classification model. To test the performance of our model, the m6A sites with single-base resolution were used and our model achieved comparable performance against existing instance-level predictors, which suggest that our model has the potential to improve the data quality of m6A-seq at reduced costs. What's more, our generic framework can be extended to other newly found modifications that are found by peak-calling approaches. The source code of HLMethy is available at https://github.com/liuze-nwafu/HLMethy.
Collapse
Affiliation(s)
- Ze Liu
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Wei Dong
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China.
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China.
| | - WenJie Luo
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - Wei Jiang
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - QuanWu Li
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| | - ZiLi He
- College of Water Resources and Architectural Engineering, Northwest A & F University, Yangling, 712100, Shaanxi, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A & F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
34
|
Abstract
Alternative Splicing produces multiple mRNA isoforms of genes which have important diverse roles such as regulation of gene expression, human heritable diseases, and response to environmental stresses. However, little has been done to assign functions at the mRNA isoform level. Functional networks, where the interactions are quantified by their probability of being involved in the same biological process are typically generated at the gene level. We use a diverse array of tissue-specific RNA-seq datasets and sequence information to train random forest models that predict the functional networks. Since there is no mRNA isoform-level gold standard, we use single isoform genes co-annotated to Gene Ontology biological process annotations, Kyoto Encyclopedia of Genes and Genomes pathways, BioCyc pathways and protein-protein interactions as functionally related (positive pair). To generate the non-functional pairs (negative pair), we use the Gene Ontology annotations tagged with "NOT" qualifier. We describe 17 Tissue-spEcific mrNa iSoform functIOnal Networks (TENSION) following a leave-one-tissue-out strategy in addition to an organism level reference functional network for mouse. We validate our predictions by comparing its performance with previous methods, randomized positive and negative class labels, updated Gene Ontology annotations, and by literature evidence. We demonstrate the ability of our networks to reveal tissue-specific functional differences of the isoforms of the same genes. All scripts and data from TENSION are available at: https://doi.org/10.25380/iastate.c.4275191 .
Collapse
Affiliation(s)
- Gaurav Kandoi
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA
- Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA
| | - Julie A Dickerson
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.
- Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA.
| |
Collapse
|
35
|
Chen H, Shaw D, Zeng J, Bu D, Jiang T. DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning. Bioinformatics 2019; 35:i284-i294. [PMID: 31510699 PMCID: PMC6612874 DOI: 10.1093/bioinformatics/btz367] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. RESULTS In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision-recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. AVAILABILITY AND IMPLEMENTATION https://github.com/haochenucr/DIFFUSE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Chen
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
| | - Dipan Shaw
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tao Jiang
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
36
|
Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks. Methods Mol Biol 2017; 1558:415-436. [PMID: 28150250 DOI: 10.1007/978-1-4939-6783-4_20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
Collapse
|
37
|
Cozzetto D, Minneci F, Currant H, Jones DT. FFPred 3: feature-based function prediction for all Gene Ontology domains. Sci Rep 2016; 6:31865. [PMID: 27561554 PMCID: PMC4999993 DOI: 10.1038/srep31865] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 07/25/2016] [Indexed: 11/09/2022] Open
Abstract
Predicting protein function has been a major goal of bioinformatics for several decades, and it has gained fresh momentum thanks to recent community-wide blind tests aimed at benchmarking available tools on a genomic scale. Sequence-based predictors, especially those performing homology-based transfers, remain the most popular but increasing understanding of their limitations has stimulated the development of complementary approaches, which mostly exploit machine learning. Here we present FFPred 3, which is intended for assigning Gene Ontology terms to human protein chains, when homology with characterized proteins can provide little aid. Predictions are made by scanning the input sequences against an array of Support Vector Machines (SVMs), each examining the relationship between protein function and biophysical attributes describing secondary structure, transmembrane helices, intrinsically disordered regions, signal peptides and other motifs. This update features a larger SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the Critical Assessment of Functional Annotation. The effectiveness of this approach is demonstrated through benchmarking experiments, and its usefulness is illustrated by analysing the potential functional consequences of alternative splicing in human and their relationship to patterns of biological features.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Federico Minneci
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Hannah Currant
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
38
|
Panwar B, Menon R, Eksi R, Li HD, Omenn GS, Guan Y. Genome-Wide Functional Annotation of Human Protein-Coding Splice Variants Using Multiple Instance Learning. J Proteome Res 2016; 15:1747-53. [PMID: 27142340 DOI: 10.1021/acs.jproteome.5b00883] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The vast majority of human multiexon genes undergo alternative splicing and produce a variety of splice variant transcripts and proteins, which can perform different functions. These protein-coding splice variants (PCSVs) greatly increase the functional diversity of proteins. Most functional annotation algorithms have been developed at the gene level; the lack of isoform-level gold standards is an important intellectual limitation for currently available machine learning algorithms. The accumulation of a large amount of RNA-seq data in the public domain greatly increases our ability to examine the functional annotation of genes at isoform level. In the present study, we used a multiple instance learning (MIL)-based approach for predicting the function of PCSVs. We used transcript-level expression values and gene-level functional associations from the Gene Ontology database. A support vector machine (SVM)-based 5-fold cross-validation technique was applied. Comparatively, genes with multiple PCSVs performed better than single PCSV genes, and performance also improved when more examples were available to train the models. We demonstrated our predictions using literature evidence of ADAM15, LMNA/C, and DMXL2 genes. All predictions have been implemented in a web resource called "IsoFunc", which is freely available for the global scientific community through http://guanlab.ccmb.med.umich.edu/isofunc .
Collapse
Affiliation(s)
- Bharat Panwar
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, and ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
39
|
Guan Y, Martini S, Mariani LH. Genes Caught In Flagranti: Integrating Renal Transcriptional Profiles With Genotypes and Phenotypes. Semin Nephrol 2016. [PMID: 26215861 DOI: 10.1016/j.semnephrol.2015.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
In the past decade, population genetics has gained tremendous success in identifying genetic variations that are statistically relevant to renal diseases and kidney function. However, it is challenging to interpret the functional relevance of the genetic variations found by population genetics studies. In this review, we discuss studies that integrate multiple levels of data, especially transcriptome profiles and phenotype data, to assign functional roles of genetic variations involved in kidney function. Furthermore, we introduce state-of-the-art machine learning algorithms, Bayesian networks, support vector machines, and Gaussian process regression, which have been applied successfully to integrating genetic, regulatory, and clinical information to predict clinical outcomes. These methods are likely to be deployed successfully in the nephrology field in the near future.
Collapse
Affiliation(s)
- Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI; Department of Internal Medicine, University of Michigan, Ann Arbor, MI; Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI
| | - Sebastian Martini
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI; Nephrologisches Zentrum, Medizinische Klinik und Poliklinik IV, Klinikum der Universität München, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Laura H Mariani
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI
| |
Collapse
|
40
|
Abstract
The laboratory mouse is the primary mammalian species used for studying alternative splicing events. Recent studies have generated computational models to predict functions for splice isoforms in the mouse. However, the functional relationship network, describing the probability of splice isoforms participating in the same biological process or pathway, has not yet been studied in the mouse. Here we describe a rich genome-wide resource of mouse networks at the isoform level, which was generated using a unique framework that was originally developed to infer isoform functions. This network was built through integrating heterogeneous genomic and protein data, including RNA-seq, exon array, protein docking and pseudo-amino acid composition. Through simulation and cross-validation studies, we demonstrated the accuracy of the algorithm in predicting isoform-level functional relationships. We showed that this network enables the users to reveal functional differences of the isoforms of the same gene, as illustrated by literature evidence with Anxa6 (annexin a6) as an example. We expect this work will become a useful resource for the mouse genetics community to understand gene functions. The network is publicly available at: http://guanlab.ccmb.med.umich.edu/isoformnetwork.
Collapse
|
41
|
Hsu MK, Pan CL, Chen FC. Functional divergence and convergence between the transcript network and gene network in lung adenocarcinoma. Onco Targets Ther 2016; 9:335-47. [PMID: 26834492 PMCID: PMC4716766 DOI: 10.2147/ott.s94897] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
INTRODUCTION Alternative RNA splicing is a critical regulatory mechanism during tumorigenesis. However, previous oncological studies mainly focused on the splicing of individual genes. Whether and how transcript isoforms are coordinated to affect cellular functions remain underexplored. Also of great interest is how the splicing regulome cooperates with the transcription regulome to facilitate tumorigenesis. The answers to these questions are of fundamental importance to cancer biology. RESULTS Here, we report a comparative study between the transcript-based network (TN) and the gene-based network (GN) derived from the transcriptomes of paired tumor-normal tissues from 77 lung adenocarcinoma patients. We demonstrate that the two networks differ significantly from each other in terms of patient clustering and the number and functions of network modules. Interestingly, the majority (89.5%) of multi-transcript genes have their transcript isoforms distributed in at least two TN modules, suggesting regulatory and functional divergences between transcript isoforms. Furthermore, TN and GN modules share onlŷ50%-60% of their biological functions. TN thus appears to constitute a regulatory layer separate from GN. Nevertheless, our results indicate that functional convergence and divergence both occur between TN and GN, implying complex interactions between the two regulatory layers. Finally, we report that the expression profiles of module members in both TN and GN shift dramatically yet concordantly during tumorigenesis. The mechanisms underlying this coordinated shifting remain unclear yet are worth further explorations. CONCLUSION We show that in lung adenocarcinoma, transcript isoforms per se are coordinately regulated to conduct biological functions not conveyed by the network of genes. However, the two networks may interact closely with each other by sharing the same or related biological functions. Unraveling the effects and mechanisms of such interactions will significantly advance our understanding of this deadly disease.
Collapse
Affiliation(s)
- Min-Kung Hsu
- Department of Biological Science and Technology, National Chiao-Tung University, Hsinchu, Taiwan
| | - Chia-Lin Pan
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| | - Feng-Chi Chen
- Department of Biological Science and Technology, National Chiao-Tung University, Hsinchu, Taiwan; Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan; School of Dentistry, China Medical University, Taichung, Taiwan
| |
Collapse
|
42
|
Li HD, Omenn GS, Guan Y. A proteogenomic approach to understand splice isoform functions through sequence and expression-based computational modeling. Brief Bioinform 2016; 17:1024-1031. [PMID: 26740460 DOI: 10.1093/bib/bbv109] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 11/03/2015] [Indexed: 01/23/2023] Open
Abstract
The products of multi-exon genes are a mixture of alternatively spliced isoforms, from which the translated proteins can have similar, different or even opposing functions. It is therefore essential to differentiate and annotate functions for individual isoforms. Computational approaches provide an efficient complement to expensive and time-consuming experimental studies. The input data of these methods range from DNA sequence, to RNA selection pressure, to expressed sequence tags, to full-length complementary DNA, to exon array, to RNA-seq expression, to proteomic data. Notably, RNA-seq technology generates quantitative profiling of transcript expression at the genome scale, with an unprecedented amount of expression data available for developing isoform function prediction methods. Integrative analysis of these data at different molecular levels enables a proteogenomic approach to systematically interrogate isoform functions. Here, we briefly review the state-of-the-art methods according to their input data sources, discuss their advantages and limitations and point out potential ways to improve prediction accuracies.
Collapse
|
43
|
Abstract
With the ability to obtain tens of millions of reads, high-throughput messenger RNA sequencing (RNA-Seq) data offers the possibility of estimating abundance of isoforms and finding novel transcripts. In this chapter, we describe a protocol to construct an RNA-Seq library for sequencing on Illumina NGS platforms, and a computational pipeline to perform RNA-Seq data analysis. The protocols described in this chapter can be applied to the analysis of differential gene expression in control versus 17β-estradiol treatment of in vivo or in vitro systems.
Collapse
Affiliation(s)
- Hanquan Liang
- McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Erliang Zeng
- Department of Biology, University of South Dakota, 414 E. Clark Street, Vermillion, SD, 57069, USA.
- Department of Computer Science, University of South Dakota, 414 E. Clark Street, Vermillion, SD, 57069, USA.
| |
Collapse
|
44
|
AlFadhli S, Ghanem AAM, Nizam R. Genome-wide peripheral blood transcriptome analysis of Arab female lupus and lupus nephritis. Gene 2015; 570:230-8. [PMID: 26072163 DOI: 10.1016/j.gene.2015.06.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Revised: 05/16/2015] [Accepted: 06/07/2015] [Indexed: 01/11/2023]
Abstract
Systemic lupus erythematosus (lupus) is a genetically heterogeneous autoimmune disorder with an obscure etiology. With 92-94% of human genes exhibiting alternative splicing, gaining insights to such events may lead to better diagnostics. Herein, we explored the genome-wide peripheral blood transcriptome of lupus and its severe form lupus-nephritis (LN) compared to healthy controls (HC). Age/gender/ethnically-matched Arab females were tested using high-density arrays and statistical analysis was carried out using appropriate software. Analysis revealed 15 splice variants that are differentially expressed between lupus/HC and 99 variants between LN/HC (p ≤ 0.05, SI> or ≤ 0.5, Benjamin Hochberg-False discovery rate correction). Comparison between LN/lupus revealed 7 variants that significantly differed in expression. Pathway analysis of differentially spliced-genes postulated 11 significant pathways in lupus and 12 in LN (p<0.05). Analysis of peripheral blood transcriptome possibly revealed signature causative genes that are alternatively spliced, signifying their clinical relevance. Present study is the first to reveal the significance of alternative variants in lupus and LN.
Collapse
Affiliation(s)
- Suad AlFadhli
- Department of Medical Laboratory Sciences, Faculty of Allied Health Sciences, Kuwait University, Kuwait.
| | | | - Rasheeba Nizam
- Department of Medical Laboratory Sciences, Faculty of Allied Health Sciences, Kuwait University, Kuwait
| |
Collapse
|
45
|
Horvatovich P, Lundberg EK, Chen YJ, Sung TY, He F, Nice EC, Goode RJ, Yu S, Ranganathan S, Baker MS, Domont GB, Velasquez E, Li D, Liu S, Wang Q, He QY, Menon R, Guan Y, Corrales FJ, Segura V, Casal JI, Pascual-Montano A, Albar JP, Fuentes M, Gonzalez-Gonzalez M, Diez P, Ibarrola N, Degano RM, Mohammed Y, Borchers CH, Urbani A, Soggiu A, Yamamoto T, Salekdeh GH, Archakov A, Ponomarenko E, Lisitsa A, Lichti CF, Mostovenko E, Kroes RA, Rezeli M, Végvári Á, Fehniger TE, Bischoff R, Vizcaíno JA, Deutsch EW, Lane L, Nilsson CL, Marko-Varga G, Omenn GS, Jeong SK, Lim JS, Paik YK, Hancock WS. Quest for Missing Proteins: Update 2015 on Chromosome-Centric Human Proteome Project. J Proteome Res 2015; 14:3415-3431. [PMID: 26076068 DOI: 10.1021/pr5013009] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
This paper summarizes the recent activities of the Chromosome-Centric Human Proteome Project (C-HPP) consortium, which develops new technologies to identify yet-to-be annotated proteins (termed "missing proteins") in biological samples that lack sufficient experimental evidence at the protein level for confident protein identification. The C-HPP also aims to identify new protein forms that may be caused by genetic variability, post-translational modifications, and alternative splicing. Proteogenomic data integration forms the basis of the C-HPP's activities; therefore, we have summarized some of the key approaches and their roles in the project. We present new analytical technologies that improve the chemical space and lower detection limits coupled to bioinformatics tools and some publicly available resources that can be used to improve data analysis or support the development of analytical assays. Most of this paper's content has been compiled from posters, slides, and discussions presented in the series of C-HPP workshops held during 2014. All data (posters, presentations) used are available at the C-HPP Wiki (http://c-hpp.webhosting.rug.nl/) and in the Supporting Information.
Collapse
Affiliation(s)
- Péter Horvatovich
- Analytical Biochemistry, Department of Pharmacy, University of Groningen , A. Deusinglaan 1, 9713 AV Groningen, The Netherlands
| | - Emma K Lundberg
- Science for Life Laboratory, KTH - Royal Institute of Technology , SE-171 21 Stockholm, Sweden
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica , 128 Academia Road Sec. 2, Taipei 115, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica , 128 Academia Road Sec. 2, Taipei 115, Taiwan
| | - Fuchu He
- The State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine , No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - Edouard C Nice
- Department of Biochemistry and Molecular Biology, Monash University , Clayton, Victoria 3800, Australia
| | - Robert J Goode
- Department of Biochemistry and Molecular Biology, Monash University , Clayton, Victoria 3800, Australia
| | - Simon Yu
- Department of Biochemistry and Molecular Biology, Monash University , Clayton, Victoria 3800, Australia
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University , Sydney, New South Wales 2109, Australia
| | - Mark S Baker
- Australian School of Advanced Medicine, Macquarie University , Sydney, NSW 2109, Australia
| | - Gilberto B Domont
- Proteomics Unit, Institute of Chemistry, Federal University of Rio de Janeiro , Cidade Universitária, Av Athos da Silveira Ramos 149, CT-A542, 21941-909 Rio de Janeriro, Rj, Brazil
| | - Erika Velasquez
- Proteomics Unit, Institute of Chemistry, Federal University of Rio de Janeiro , Cidade Universitária, Av Athos da Silveira Ramos 149, CT-A542, 21941-909 Rio de Janeriro, Rj, Brazil
| | - Dong Li
- The State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine , No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - Siqi Liu
- Beijing Institute of Genomics and BGI Shenzhen , No. 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- BGI Shenzhen , Beishan Road, Yantian District, Shenzhen, 518083, China
| | - Quanhui Wang
- Beijing Institute of Genomics and BGI Shenzhen , No. 1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Qing-Yu He
- ■ Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, College of Life Science and Technology, Jinan University , Guangzhou 510632, China
| | - Rajasree Menon
- Department of Computational Medicine & Bioinformatics, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
| | - Yuanfang Guan
- Departments of Computational Medicine & Bioinformatics and Computer Sciences, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
| | - Fernando J Corrales
- ProteoRed-ISCIII, Biomolecular and Bioinformatics Resources Platform (PRB2), Spanish Consortium of C-HPP (Chr-16), CIMA, University of Navarra, 31008 Pamplona, Spain
- Chr16 SpHPP Consortium , CIMA, University of Navarra, 31008 Pamplona, Spain
| | - Victor Segura
- ProteoRed-ISCIII, Biomolecular and Bioinformatics Resources Platform (PRB2), Spanish Consortium of C-HPP (Chr-16), CIMA, University of Navarra, 31008 Pamplona, Spain
- Chr16 SpHPP Consortium , CIMA, University of Navarra, 31008 Pamplona, Spain
| | - J Ignacio Casal
- Department of Cellular and Molecular Medicine, Centro de Investigaciones Biológicas (CIB-CSIC) , 28040 Madrid, Spain
| | | | - Juan P Albar
- Centro Nacional de Biotecnologia (CNB-CSIC) , Cantoblanco, 28049 Madrid, Spain
| | - Manuel Fuentes
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Maria Gonzalez-Gonzalez
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Paula Diez
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Nieves Ibarrola
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Rosa M Degano
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Yassene Mohammed
- University of Victoria -Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, #3101-4464 Markham Street, Victoria, British Columbia V8Z 7X8, Canada
- Center for Proteomics and Metabolomics, Leiden University Medical Center , 2333 ZA Leiden, The Netherlands
| | - Christoph H Borchers
- University of Victoria -Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, #3101-4464 Markham Street, Victoria, British Columbia V8Z 7X8, Canada
| | - Andrea Urbani
- Proteomics and Metabonomic, Laboratory, Fondazione Santa Lucia , Rome, Italy
- Department of Experimental Medicine and Surgery, University of Rome "Tor Vergata" , Rome, Italy
| | - Alessio Soggiu
- Department of Veterinary Science and Public Health (DIVET), University of Milano , via Celoria 10, 20133 Milano, Italy
| | - Tadashi Yamamoto
- Institute of Nephrology, Graduate School of Medical and Dental Sciences, Niigata University , Niigata, Japan
| | - Ghasem Hosseini Salekdeh
- Department of Molecular Systems Biology at Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran
- Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran, Karaj, Iran
| | | | | | - Andrey Lisitsa
- Orechovich Institute of Biomedical Chemistry , Moscow, Russia
| | - Cheryl F Lichti
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch , Galveston, Texas 77555-0617, United States
| | - Ekaterina Mostovenko
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch , Galveston, Texas 77555-0617, United States
| | - Roger A Kroes
- Falk Center for Molecular Therapeutics, Department of Biomedical Engineering, Northwestern University , 1801 Maple Ave., Suite 4300, Evanston, Illinois 60201, United States
| | - Melinda Rezeli
- Clinical Protein Science & Imaging, Department of Biomedical Engineering, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Ákos Végvári
- Clinical Protein Science & Imaging, Department of Biomedical Engineering, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Thomas E Fehniger
- Clinical Protein Science & Imaging, Department of Biomedical Engineering, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Rainer Bischoff
- Analytical Biochemistry, Department of Pharmacy, University of Groningen , A. Deusinglaan 1, 9713 AV Groningen, The Netherlands
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, United Kingdom
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics , Geneva, Switzerland
- Department of Human Protein Science, Faculty of Medicine, University of Geneva , Geneva, Switzerland
| | - Carol L Nilsson
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch , Galveston, Texas 77555-0617, United States
| | - György Marko-Varga
- Clinical Protein Science & Imaging, Department of Biomedical Engineering, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Gilbert S Omenn
- Departments of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics and School of Public Health, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
| | - Seul-Ki Jeong
- Departments of Integrated Omics for Biomedical Science & Biochemistry, College of Life Science and Technology, Yonsei Proteome Research Center, Yonsei University , Seoul, 120-749, Korea
| | - Jong-Sun Lim
- Departments of Integrated Omics for Biomedical Science & Biochemistry, College of Life Science and Technology, Yonsei Proteome Research Center, Yonsei University , Seoul, 120-749, Korea
| | - Young-Ki Paik
- Departments of Integrated Omics for Biomedical Science & Biochemistry, College of Life Science and Technology, Yonsei Proteome Research Center, Yonsei University , Seoul, 120-749, Korea
| | - William S Hancock
- The Barnett Institute of Chemical and Biological Analysis, Northeastern University , 140 The Fenway, Boston, Massachusetts 02115, United States
| |
Collapse
|
46
|
Guo Z, Tzvetkova B, Bassik JM, Bodziak T, Wojnar BM, Qiao W, Obaida MA, Nelson SB, Hu BH, Yu P. RNASeqMetaDB: a database and web server for navigating metadata of publicly available mouse RNA-Seq datasets. Bioinformatics 2015; 31:4038-40. [PMID: 26323714 DOI: 10.1093/bioinformatics/btv503] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 08/23/2015] [Indexed: 01/24/2023] Open
Abstract
UNLABELLED Gene targeting is a protocol for introducing a mutation to a specific gene in an organism. Because of the importance of in vivo assessment of gene function and modeling of human diseases, this technique has been widely adopted to generate a large number of mutant mouse models. Due to the recent breakthroughs in high-throughput sequencing technologies, RNA-Seq experiments have been performed on many of these mouse models, leading to hundreds of publicly available datasets. To facilitate the reuse of these datasets, we collected the associated metadata and organized them in a database called RNASeqMetaDB. The metadata were manually curated to ensure annotation consistency. We developed a web server to allow easy database navigation and data querying. Users can search the database using multiple parameters like genes, diseases, tissue types, keywords and associated publications in order to find datasets that match their interests. Summary statistics of the metadata are also presented on the web server showing interesting global patterns of RNA-Seq studies. AVAILABILITY AND IMPLEMENTATION Freely available on the web at http://rnaseqmetadb.ece.tamu.edu.
Collapse
Affiliation(s)
- Zhengyu Guo
- Department of Electrical and Computer Engineering & TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Boriana Tzvetkova
- Department of Biology & Center for Behavioral Genomics, Brandeis University, Waltham, MA 02454, USA and
| | - Jennifer M Bassik
- Department of Communicative Disorders and Sciences & Center for Hearing and Deafness, University at Buffalo, Buffalo, NY 14214, USA
| | - Tara Bodziak
- Department of Communicative Disorders and Sciences & Center for Hearing and Deafness, University at Buffalo, Buffalo, NY 14214, USA
| | - Brianna M Wojnar
- Department of Communicative Disorders and Sciences & Center for Hearing and Deafness, University at Buffalo, Buffalo, NY 14214, USA
| | - Wei Qiao
- Department of Electrical and Computer Engineering &
| | - Md A Obaida
- Department of Electrical and Computer Engineering &
| | - Sacha B Nelson
- Department of Biology & Center for Behavioral Genomics, Brandeis University, Waltham, MA 02454, USA and
| | - Bo Hua Hu
- Department of Communicative Disorders and Sciences & Center for Hearing and Deafness, University at Buffalo, Buffalo, NY 14214, USA
| | - Peng Yu
- Department of Electrical and Computer Engineering & TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
47
|
Li HD, Menon R, Govindarajoo B, Panwar B, Zhang Y, Omenn GS, Guan Y. Functional Networks of Highest-Connected Splice Isoforms: From The Chromosome 17 Human Proteome Project. J Proteome Res 2015. [PMID: 26216192 DOI: 10.1021/acs.jproteome.5b00494] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Alternative splicing allows a single gene to produce multiple transcript-level splice isoforms from which the translated proteins may show differences in their expression and function. Identifying the major functional or canonical isoform is important for understanding gene and protein functions. Identification and characterization of splice isoforms is a stated goal of the HUPO Human Proteome Project and of neXtProt. Multiple efforts have catalogued splice isoforms as "dominant", "principal", or "major" isoforms based on expression or evolutionary traits. In contrast, we recently proposed highest connected isoforms (HCIs) as a new class of canonical isoforms that have the strongest interactions in a functional network and revealed their significantly higher (differential) transcript-level expression compared to nonhighest connected isoforms (NCIs) regardless of tissues/cell lines in the mouse. HCIs and their expression behavior in the human remain unexplored. Here we identified HCIs for 6157 multi-isoform genes using a human isoform network that we constructed by integrating a large compendium of heterogeneous genomic data. We present examples for pairs of transcript isoforms of ABCC3, RBM34, ERBB2, and ANXA7. We found that functional networks of isoforms of the same gene can show large differences. Interestingly, differential expression between HCIs and NCIs was also observed in the human on an independent set of 940 RNA-seq samples across multiple tissues, including heart, kidney, and liver. Using proteomic data from normal human retina and placenta, we showed that HCIs are a promising indicator of expressed protein isoforms exemplified by NUDFB6 and M6PR. Furthermore, we found that a significant percentage (20%, p = 0.0003) of human and mouse HCIs are homologues, suggesting their conservation between species. Our identified HCIs expand the repertoire of canonical isoforms and are expected to facilitate studying main protein products, understanding gene regulation, and possibly evolution. The network is available through our web server as a rich resource for investigating isoform functional relationships (http://guanlab.ccmb.med.umich.edu/hisonet). All MS/MS data were available at ProteomeXchange Web site (http://www.proteomexchange.org) through their identifiers (retina: PXD001242, placenta: PXD000754).
Collapse
Affiliation(s)
- Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Bharat Panwar
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
48
|
Panwar B, Menon R, Eksi R, Omenn GS, Guan Y. MI-PVT: A Tool for Visualizing the Chromosome-Centric Human Proteome. J Proteome Res 2015. [PMID: 26204236 DOI: 10.1021/acs.jproteome.5b00525] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
We have developed the web-based Michigan Proteome Visualization Tool (MI-PVT) to visualize and compare protein expression and isoform-level function across human chromosomes and tissues (http://guanlab.ccmb.med.umich.edu/mipvt). As proof of principle, we have populated the tool with Human Proteome Map (HPM) data. We were able to observe many biologically interesting features. From the vantage point of our chromosome 17 team, for example, we found more than 300 proteins from chromosome 17 expressed in each of the 30 tissues and cell types studied, with the highest number of expressed proteins being 685 in testis. Comparisons of expression levels across tissues showed low numbers of proteins expressed in esophagus, but esophagus had 12 cytoskeletal proteins coded on chromosome 17 with very high expression (>1000 spectral counts). This customized MI-PVT should be helpful for biologists to browse and study specific proteins and protein data sets across tissues and chromosomes. Users can upload any data of interest in MI-PVT for visualization. Our aim is to integrate extensive mass-spectrometric proteomic data into the tool to facilitate finding chromosome-centric protein expression and correlation across tissues.
Collapse
Affiliation(s)
- Bharat Panwar
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, ‡Department of Internal Medicine, §Department of Human Genetics and School of Public Health, ∥Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
49
|
Menon R, Panwar B, Eksi R, Kleer C, Guan Y, Omenn GS. Computational Inferences of the Functions of Alternative/Noncanonical Splice Isoforms Specific to HER2+/ER-/PR- Breast Cancers, a Chromosome 17 C-HPP Study. J Proteome Res 2015; 14:3519-29. [PMID: 26147891 DOI: 10.1021/acs.jproteome.5b00498] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The main objective is to identify and evaluate functionality of a set of specific noncanonical isoforms expressed in HER2-neu positive, estrogen receptor negative (ER-), and progesterone receptor negative (PR-) breast cancers (HER2+/ER-/PR- BC), an aggressive subtype of breast cancers that cause significant morbidity and mortality. We identified 11 alternative splice isoforms that were differentially expressed in HER2+/ER-/PR- BC compared to normal mammary, triple negative breast cancer and triple positive breast cancer tissues (HER2+/ER+/PR+). We used a stringent criterion that differentially expressed noncanonical isoforms (adjusted p value < 0.05) and have to be expressed in all replicates of HER2+/ER-/PR- BC samples, and the trend in differential expression (up or down) is the same in all comparisons. Of the 11 protein isoforms, six were overexpressed in HER2+/ER-/PR- BC. We explored possible functional roles of these six proteins using several complementary computational tools. Biological processes including cell cycle events and glycolysis were linked to four of these proteins. For example, glycolysis was the top ranking functional process for DMXL2 isoform 3, with a fold change of 27 compared to just two for the canonical protein. No previous reports link DMXL2 with any metabolic processes; the canonical protein is known to participate in signaling pathways. Our results clearly indicate distinct functions for the six overexpressed alternative splice isoforms, and these functions could be specific to HER2+/ER-/PR- tumor progression. Further detailed analysis is warranted as these proteins could be explored as potential biomarkers and therapeutic targets for HER2+/ER-/PR- BC patients.
Collapse
Affiliation(s)
- Rajasree Menon
- University of Michigan , 100 Washtenaw Avenue, Room 2044B, Palmer Commons, Ann Arbor, Michigan 48109, United States
| | - Bharat Panwar
- University of Michigan , 100 Washtenaw Avenue, Room 2044B, Palmer Commons, Ann Arbor, Michigan 48109, United States
| | - Ridvan Eksi
- University of Michigan , 100 Washtenaw Avenue, Room 2044B, Palmer Commons, Ann Arbor, Michigan 48109, United States
| | - Celina Kleer
- University of Michigan , 100 Washtenaw Avenue, Room 2044B, Palmer Commons, Ann Arbor, Michigan 48109, United States
| | - Yuanfang Guan
- University of Michigan , 100 Washtenaw Avenue, Room 2044B, Palmer Commons, Ann Arbor, Michigan 48109, United States
| | - Gilbert S Omenn
- University of Michigan , 100 Washtenaw Avenue, Room 2044B, Palmer Commons, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
50
|
Zhao H, Zhao F. BreakSeek: a breakpoint-based algorithm for full spectral range INDEL detection. Nucleic Acids Res 2015; 43:6701-13. [PMID: 26117537 PMCID: PMC4538813 DOI: 10.1093/nar/gkv605] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 05/28/2015] [Indexed: 11/18/2022] Open
Abstract
Although recent developed algorithms have integrated multiple signals to improve sensitivity for insertion and deletion (INDEL) detection, they are far from being perfect and still have great limitations in detecting a full size range of INDELs. Here we present BreakSeek, a novel breakpoint-based algorithm, which can unbiasedly and efficiently detect both homozygous and heterozygous INDELs, ranging from several base pairs to over thousands of base pairs, with accurate breakpoint and heterozygosity rate estimations. Comprehensive evaluations on both simulated and real datasets revealed that BreakSeek outperformed other existing methods on both sensitivity and specificity in detecting both small and large INDELs, and uncovered a significant amount of novel INDELs that were missed before. In addition, by incorporating sophisticated statistic models, we for the first time investigated and demonstrated the importance of handling false and conflicting signals for multi-signal integrated methods.
Collapse
Affiliation(s)
- Hui Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| | - Fangqing Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|