1
|
Laine E, Freiberger MI. Toward a comprehensive profiling of alternative splicing proteoform structures, interactions and functions. Curr Opin Struct Biol 2025; 90:102979. [PMID: 39778413 PMCID: PMC7617313 DOI: 10.1016/j.sbi.2024.102979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 11/26/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025]
Abstract
The mRNA splicing machinery has been estimated to generate 100,000 known protein-coding transcripts for 20,000 human genes (Ensembl, Sept. 2024). However, this set is expanding with the massive and rapidly growing data coming from high-throughput technologies, particularly single-cell and long-read sequencing. Yet, the implications of splicing complexity at the protein level remain largely uncharted. In this review, we describe the current advances toward systematically assessing the contribution of alternative splicing to proteome function diversification. We discuss the potential and challenges of using artificial intelligence-based techniques in identifying alternative splicing proteoforms and characterising their structures, interactions, and functions.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, 75005 Paris, France; Institut universitaire de France (IUF), France.
| | - Maria Inés Freiberger
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, 75005 Paris, France
| |
Collapse
|
2
|
Liu Y, Li HD, Wang J. CrossIsoFun: predicting isoform functions using the integration of multi-omics data. Bioinformatics 2024; 41:btae742. [PMID: 39680906 PMCID: PMC11706537 DOI: 10.1093/bioinformatics/btae742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 11/16/2024] [Accepted: 12/13/2024] [Indexed: 12/18/2024] Open
Abstract
MOTIVATION Isoforms spliced from the same gene may carry distinct biological functions. Therefore, annotating functions at the isoform level provides valuable insights into the functional diversity of genomes. Since experimental approaches for determining isoform functions are time- and cost-demanding, computational methods have been proposed. In this case, multi-omics data integration helps enhance the model performance, providing complementary insights for isoform functions. However, current methods underperform in leveraging diverse omics data, primarily due to the limited power to integrate the heterogeneous feature domains. Besides, among the multi-omics data, isoform-isoform interactions (IIIs) are a key data source, as isoforms interact with each other to perform functions. Unfortunately, IIIs remain largely underutilized in isoform function predictions until now. RESULTS We introduce CrossIsoFun, a multi-omics data analysis framework for isoform function prediction. CrossIsoFun combines omics-specific and cross-omics learning for data integration and function prediction. In detail, CrossIsoFun uses a graph convolutional network (GCN) as the omics-specific classifier for each data source. The initial label predictions from GCNs are forwarded to the View Correlation Discovery Network (VCDN) and processed as a cross-omics integrative representation. The representation is then used to produce final predictions of isoform functions. In addition, an antoencoder within a cycle-consistency generative adversarial network (cycleGAN) is designed to generate IIIs from PPIs and thereby enrich the interactomics data. Our method outperforms the state-of-the-art methods on three tissue-naive datasets and 15 tissue-specific datasets with mRNA expression, sequence, and PPI data. The prediction of CrossIsoFun is further validated by its consistency with subcellular localization and isoform-level annotations with literature support. AVAILABILITY AND IMPLEMENTATION CrossIsoFun is freely available at https://github.com/genemine/CrossIsoFun.
Collapse
Affiliation(s)
- Yiwei Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
3
|
Kiseleva OI, Arzumanian VA, Kurbatov IY, Poverennaya EV. In silico and in cellulo approaches for functional annotation of human protein splice variants. BIOMEDITSINSKAIA KHIMIIA 2024; 70:315-328. [PMID: 39324196 DOI: 10.18097/pbmc20247005315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
The elegance of pre-mRNA splicing mechanisms continues to interest scientists even after over a half century, since the discovery of the fact that coding regions in genes are interrupted by non-coding sequences. The vast majority of human genes have several mRNA variants, coding structurally and functionally different protein isoforms in a tissue-specific manner and with a linkage to specific developmental stages of the organism. Alteration of splicing patterns shifts the balance of functionally distinct proteins in living systems, distorts normal molecular pathways, and may trigger the onset and progression of various pathologies. Over the past two decades, numerous studies have been conducted in various life sciences disciplines to deepen our understanding of splicing mechanisms and the extent of their impact on the functioning of living systems. This review aims to summarize experimental and computational approaches used to elucidate the functions of splice variants of a single gene based on our experience accumulated in the laboratory of interactomics of proteoforms at the Institute of Biomedical Chemistry (IBMC) and best global practices.
Collapse
Affiliation(s)
- O I Kiseleva
- Institute of Biomedical Chemistry, Moscow, Russia
| | | | | | | |
Collapse
|
4
|
Tan H, Guo M, Chen J, Wang J, Yu G. HetFCM: functional co-module discovery by heterogeneous network co-clustering. Nucleic Acids Res 2024; 52:e16. [PMID: 38088228 PMCID: PMC10853805 DOI: 10.1093/nar/gkad1174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 10/31/2023] [Accepted: 11/23/2023] [Indexed: 02/10/2024] Open
Abstract
Functional molecular module (i.e., gene-miRNA co-modules and gene-miRNA-lncRNA triple-layer modules) analysis can dissect complex regulations underlying etiology or phenotypes. However, current module detection methods lack an appropriate usage and effective model of multi-omics data and cross-layer regulations of heterogeneous molecules, causing the loss of critical genetic information and corrupting the detection performance. In this study, we propose a heterogeneous network co-clustering framework (HetFCM) to detect functional co-modules. HetFCM introduces an attributed heterogeneous network to jointly model interplays and multi-type attributes of different molecules, and applies multiple variational graph autoencoders on the network to generate cross-layer association matrices, then it performs adaptive weighted co-clustering on association matrices and attribute data to identify co-modules of heterogeneous molecules. Empirical study on Human and Maize datasets reveals that HetFCM can find out co-modules characterized with denser topology and more significant functions, which are associated with human breast cancer (subtypes) and maize phenotypes (i.e., lipid storage, drought tolerance and oil content). HetFCM is a useful tool to detect co-modules and can be applied to multi-layer functional modules, yielding novel insights for analyzing molecular mechanisms. We also developed a user-friendly module detection and analysis tool and shared it at http://www.sdu-idea.cn/FMDTool.
Collapse
Affiliation(s)
- Haojiang Tan
- School of Software, Shandong University, Jinan 250101, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing Uni. of Civil Eng. and Arch., Beijing 100044, China
| | - Jian Chen
- College of Agronomy & Biotechnolog, China Agricultural University, Beijing 100193, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan 250101, Shandong, China
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250101, Shandong, China
| |
Collapse
|
5
|
Zhao T, Sun S, Gao Y, Rong Y, Wang H, Qi S, Li Y. Luteolin and triptolide: Potential therapeutic compounds for post-stroke depression via protein STAT. Heliyon 2023; 9:e18622. [PMID: 37600392 PMCID: PMC10432979 DOI: 10.1016/j.heliyon.2023.e18622] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 07/18/2023] [Accepted: 07/24/2023] [Indexed: 08/22/2023] Open
Abstract
Post stroke depression (PSD) is a common neuropsychiatric complication following stroke closely associated with the immune system. The development of medications for PSD remains to be a considerable challenge due to the unclear mechanism of PSD. Multiple researches agree that the functions of gene ontology (GO) are efficient for the investigation of disease mechanisms, and DeepPurpose (DP) is extremely valuable for the mining of new drugs. However, GO terms and DP have not yet been applied to explore the pathogenesis and drug treatment of PSD. This study aimed to interpret the mechanism of PSD and discover important drug candidates targeting risk proteins, based on immune-related risk GO functions and informatics algorithms. According to the risk genes of PSD, we identified 335 immune-related risk GO functions and 37 compounds. Based on the construction of the GO function network, we found that STAT protein may be a pivot protein in underlying the mechanism of PSD. Additionally, we also established networks of Protein-Protein Interaction as well as Gene-GO function to facilitate the evaluation of key genes. Based on DP, a total of 37 candidate compounds targeting 7 key proteins were identified with a potential for the therapy of PSD. Furthermore, we noted that the mechanisms by which luteolin and triptolide acting on STAT-related GO function might involve three crucial pathways, including specifically hsa04010 (MAPK signaling pathway), hsa04151 (PI3K-Akt signaling pathway) and hsa04060 (Cytokine-cytokine receptor interaction). Thus, this study provided fresh and powerful information for the mechanism and therapeutic strategies of PSD.
Collapse
Affiliation(s)
- Tianyang Zhao
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Siqi Sun
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yueyue Gao
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yuting Rong
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Hanwenchen Wang
- The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Sihua Qi
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yan Li
- Department of Anesthesiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
6
|
Ling X, Wang Q, Zhang J, Zhang G. Genome-Wide Analysis of the KLF Gene Family in Chicken: Characterization and Expression Profile. Animals (Basel) 2023; 13:ani13091429. [PMID: 37174466 PMCID: PMC10177326 DOI: 10.3390/ani13091429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/08/2023] [Accepted: 04/20/2023] [Indexed: 05/15/2023] Open
Abstract
The kruppel-like factor (KLF) gene family is a group of transcription factors containing highly conserved zinc-finger motifs, which play a crucial role in cell proliferation and differentiation. Chicken has been widely used as a model animal for analyzing gene function, however, little is known about the function of the KLF gene family in chickens. In this study, we performed genome-wide studies of chicken KLF genes and analyzed their biological and expression characteristics. We identified 13 KLF genes from chickens. Our phylogenetic, motif, and conserved domain analyses indicate that the KLF gene family has remained conserved through evolution. Synteny analysis showed the collinear relationship among KLFs, which indicated that they had related biomolecular functions. Interaction network analysis revealed that KLFs worked with 20 genes in biological processes. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis showed that KLF2 was involved in Apelin and Forkhead Box O (FOXO) signaling pathways. Moreover, qPCR showed that 13 KLF genes were expressed in the nine selected tissues and displayed various gene expression patterns in chickens. RNA-seq showed that KLF3 and KLF10 genes were differentially expressed in the normal and high-fat diet fed groups, and KLF4, KLF5, KLF6, KLF7, KLF9, KLF12, and KLF13 genes were differentially expressed between undifferentiated and differentiated chicken preadipocytes. Besides, RNA-seq also showed that KLF genes displayed different expression patterns in muscle at 11 and 16 embryonic days old, and in 1-day-old chickens. These results indicated that the KLF genes were involved in the development of muscle and fat in chickens. Our findings provide some valuable reference points for the subsequent study of the function of KLF genes.
Collapse
Affiliation(s)
- Xuanze Ling
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225000, China
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Yangzhou University, Yangzhou 225000, China
| | - Qifan Wang
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225000, China
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Yangzhou University, Yangzhou 225000, China
| | - Jin Zhang
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225000, China
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Yangzhou University, Yangzhou 225000, China
| | - Genxi Zhang
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225000, China
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Yangzhou University, Yangzhou 225000, China
| |
Collapse
|