1
|
Ma T, Jiang M, Pang S, Zhang Z, Hang H, Zhou W, Zhang Y. SeqMG-RPI: A Sequence-Based Framework Integrating Multi-Scale RNA Features and Protein Graphs for RNA-Protein Interaction Prediction. J Chem Inf Model 2025. [PMID: 40262169 DOI: 10.1021/acs.jcim.5c00176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/24/2025]
Abstract
RNA-protein interaction (RPI) plays a crucial role in cell biology, and accurate prediction of RPI is essential to understand molecular mechanisms and advance disease research. Some existing RPI prediction methods typically rely on a single feature and there is significant room for improvement. In this paper, we propose a novel sequence-based RPI prediction method, called SeqMG-RPI. For RNA, SeqMG-RPI introduces an innovative multi-scale RNA feature that integrates three sequence-based representations: a multi-channel RNA feature, a k-mer frequency feature, and a k-mer sparse matrix feature. For protein, SeqMG-RPI utilizes a graph-based protein feature to capture protein information. Moreover, a novel neural network architecture is constructed for feature extraction and RPI prediction. Through experiments from multiple perspectives across various datasets, it is demonstrated that the proposed method outperforms existing methods, which has better performance and generalization.
Collapse
Affiliation(s)
- Teng Ma
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| | - Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| | - Shunpeng Pang
- School of Computer Engineering, Weifang University, Weifang 261061, China
| | - Zhi Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| | - Huaibin Hang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| | - Wei Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| |
Collapse
|
2
|
Perurena-Prieto J, Sanz-Martínez MT, Viñas-Giménez L, Codina-Clavaguera C, Triginer L, Gordillo-González F, Andrés-León E, Batlle-Masó L, Martin J, Selva-O'Callaghan A, Pujol R, McHugh NJ, Tansley SL, Colobran R, Guillen-Del-Castillo A, Simeón-Aznar CP. Expanding the landscape of systemic sclerosis-related autoantibodies through RNA immunoprecipitation coupled with massive parallel sequencing. J Autoimmun 2024; 149:103328. [PMID: 39500147 DOI: 10.1016/j.jaut.2024.103328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 10/22/2024] [Accepted: 10/26/2024] [Indexed: 12/15/2024]
Abstract
OBJECTIVES Systemic sclerosis (SSc)-related autoantibodies are widely used diagnostic and prognostic biomarkers. This study aimed to develop a new assay for detecting anti-ribonucleoprotein autoantibodies in SSc based on RNA immunoprecipitation (RNA IP) coupled with massive parallel sequencing. METHODS Serum samples and clinical data were collected from 307 SSc patients. Among these, 57 samples underwent analysis using a new protocol that combines RNA IP with massive parallel sequencing (RIP-Seq). Filtering strategies and statistical outlier detection methods were applied to select RNA molecules that could represent novel ribonucleoprotein autoantigens associated with SSc. RESULTS Among the 30,966 different RNA molecules identified by RIP-Seq in 57 SSc patients, 197 were ultimately selected. These included all RNA molecules previously identified by RNA IP, which were found to exhibit high counts almost exclusively in samples positive for the autoantibodies associated to the corresponding RNA molecule, indicating high sensitivity and specificity of the RIP-Seq technique. C/D box snoRNAs were the most abundant RNA type identified. The immunoprecipitation patterns of the detected C/D box snoRNAs varied among patients and could be associated with different clinical phenotypes. In addition, other ribonucleoproteins were identified, which could be potential targets for previously undescribed SSc-related autoantibodies. These include H/ACA box snoRNPs, vault complexes, mitochondrial tRNA synthetases, and 7SK snRNP. CONCLUSION A novel RIP-Seq assay has been developed to detect autoantibodies targeting ribonucleoprotein complexes in SSc patients. This method successfully identified RNA molecules associated with ribonucleoproteins known to be targeted by SSc-related autoantibodies, validating both the assay and the analysis strategy. Additionally, this approach uncovered RNA molecules associated with ribonucleoproteins that were not previously identified as targets of SSc patients' sera, suggesting potential new autoantibody candidates in this disease.
Collapse
Affiliation(s)
- Janire Perurena-Prieto
- Immunology Division, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Translational Immunology Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Department of Cell Biology, Physiology and Immunology, Autonomous University of Barcelona (UAB), Bellaterra, Spain
| | - María Teresa Sanz-Martínez
- Immunology Division, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Translational Immunology Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| | - Laura Viñas-Giménez
- Immunology Division, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Translational Immunology Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| | - Claudia Codina-Clavaguera
- Systemic Autoimmune Diseases Unit, Internal Medicine Department, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Systemic Autoimmune Diseases Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| | - Laura Triginer
- Systemic Autoimmune Diseases Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| | | | - Eduardo Andrés-León
- Institute of Parasitology and Biomedicine "López-Neyra", CSIC (IPBLN-CSIC), Granada, Spain
| | - Laura Batlle-Masó
- Infection and Immunity in Pediatric Patients Research Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Pediatric Infectious Diseases and Immunodeficiencies Unit, Children's Hospital, Hospital Universitari Vall d'Hebron (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Pompeu Fabra University (UPF), Barcelona, Spain
| | - Javier Martin
- Institute of Parasitology and Biomedicine "López-Neyra", CSIC (IPBLN-CSIC), Granada, Spain
| | - Albert Selva-O'Callaghan
- Systemic Autoimmune Diseases Unit, Internal Medicine Department, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Systemic Autoimmune Diseases Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| | - Ricardo Pujol
- Department of Cell Biology, Physiology and Immunology, Autonomous University of Barcelona (UAB), Bellaterra, Spain; Vall d'Hebron Institute of Oncology (VHIO), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| | - Neil J McHugh
- Department of Pharmacy and Pharmacology, University of Bath, Bath, UK
| | - Sarah L Tansley
- Department of Pharmacy and Pharmacology, University of Bath, Bath, UK
| | - Roger Colobran
- Immunology Division, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Translational Immunology Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Department of Cell Biology, Physiology and Immunology, Autonomous University of Barcelona (UAB), Bellaterra, Spain; Department of Clinical and Molecular Genetics, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain.
| | - Alfredo Guillen-Del-Castillo
- Systemic Autoimmune Diseases Unit, Internal Medicine Department, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Systemic Autoimmune Diseases Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain.
| | - Carmen Pilar Simeón-Aznar
- Systemic Autoimmune Diseases Unit, Internal Medicine Department, Vall d'Hebron University Hospital (HUVH), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain; Systemic Autoimmune Diseases Group, Vall d'Hebron Research Institute (VHIR), Vall d'Hebron Barcelona Hospital Campus, Barcelona, Spain
| |
Collapse
|
3
|
Gallardo-Dodd CJ, Kutter C. The regulatory landscape of interacting RNA and protein pools in cellular homeostasis and cancer. Hum Genomics 2024; 18:109. [PMID: 39334294 PMCID: PMC11437681 DOI: 10.1186/s40246-024-00678-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Accepted: 09/22/2024] [Indexed: 09/30/2024] Open
Abstract
Biological systems encompass intricate networks governed by RNA-protein interactions that play pivotal roles in cellular functions. RNA and proteins constituting 1.1% and 18% of the mammalian cell weight, respectively, orchestrate vital processes from genome organization to translation. To date, disentangling the functional fraction of the human genome has presented a major challenge, particularly for noncoding regions, yet recent discoveries have started to unveil a host of regulatory functions for noncoding RNAs (ncRNAs). While ncRNAs exist at different sizes, structures, degrees of evolutionary conservation and abundances within the cell, they partake in diverse roles either alone or in combination. However, certain ncRNA subtypes, including those that have been described or remain to be discovered, are poorly characterized given their heterogeneous nature. RNA activity is in most cases coordinated through interactions with RNA-binding proteins (RBPs). Extensive efforts are being made to accurately reconstruct RNA-RBP regulatory networks, which have provided unprecedented insight into cellular physiology and human disease. In this review, we provide a comprehensive view of RNAs and RBPs, focusing on how their interactions generate functional signals in living cells, particularly in the context of post-transcriptional regulatory processes and cancer.
Collapse
Affiliation(s)
- Carlos J Gallardo-Dodd
- Department of Microbiology, Tumor, and Cell Biology, Science for Life Laboratory, Karolinska Institute, Solna, Sweden
| | - Claudia Kutter
- Department of Microbiology, Tumor, and Cell Biology, Science for Life Laboratory, Karolinska Institute, Solna, Sweden.
| |
Collapse
|
4
|
Sadée C, Hagler LD, Becker WR, Jarmoskaite I, Vaidyanathan PP, Denny SK, Greenleaf WJ, Herschlag D. A comprehensive thermodynamic model for RNA binding by the Saccharomyces cerevisiae Pumilio protein PUF4. Nat Commun 2022; 13:4522. [PMID: 35927243 PMCID: PMC9352680 DOI: 10.1038/s41467-022-31968-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 07/07/2022] [Indexed: 11/12/2022] Open
Abstract
Genomic methods have been valuable for identifying RNA-binding proteins (RBPs) and the genes, pathways, and processes they regulate. Nevertheless, standard motif descriptions cannot be used to predict all RNA targets or test quantitative models for cellular interactions and regulation. We present a complete thermodynamic model for RNA binding to the S. cerevisiae Pumilio protein PUF4 derived from direct binding data for 6180 RNAs measured using the RNA on a massively parallel array (RNA-MaP) platform. The PUF4 model is highly similar to that of the related RBPs, human PUM2 and PUM1, with one marked exception: a single favorable site of base flipping for PUF4, such that PUF4 preferentially binds to a non-contiguous series of residues. These results are foundational for developing and testing cellular models of RNA-RBP interactions and function, for engineering RBPs, for understanding the biophysical nature of RBP binding and the evolutionary landscape of RNAs and RBPs.
Collapse
Affiliation(s)
- Christoph Sadée
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Lauren D Hagler
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Winston R Becker
- Biophysics Program, Stanford University School of Medicine, Stanford, CA, USA
| | - Inga Jarmoskaite
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Pavanapuresan P Vaidyanathan
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Protillion Biosciences, Burlingame, CA, USA
| | - Sarah K Denny
- Biophysics Program, Stanford University School of Medicine, Stanford, CA, USA
- Scribe Therapeutics, Alameda, CA, USA
| | - William J Greenleaf
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Daniel Herschlag
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Chemical Engineering, Stanford University, Stanford, CA, USA.
- ChEM-H Institute, Stanford University, Stanford, CA, USA.
| |
Collapse
|
5
|
Arora V, Sanguinetti G. Challenges for machine learning in RNA-protein interaction prediction. Stat Appl Genet Mol Biol 2022; 21:sagmb-2021-0087. [PMID: 35073469 DOI: 10.1515/sagmb-2021-0087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/02/2022] [Indexed: 11/15/2022]
Abstract
RNA-protein interactions have long being recognised as crucial regulators of gene expression. Recently, the development of scalable experimental techniques to measure these interactions has revolutionised the field, leading to the production of large-scale datasets which offer both opportunities and challenges for machine learning techniques. In this brief note, we will discuss some of the major stumbling blocks towards the use of machine learning in computational RNA biology, focusing specifically on the problem of predicting RNA-protein interactions from next-generation sequencing data.
Collapse
Affiliation(s)
- Viplove Arora
- Data Science, Department of Physics, International School for Advanced Studies (SISSA), Trieste 34136, Italy
| | - Guido Sanguinetti
- Data Science, Department of Physics, International School for Advanced Studies (SISSA), Trieste 34136, Italy
| |
Collapse
|
6
|
Wei J, Chen S, Zong L, Gao X, Li Y. Protein-RNA interaction prediction with deep learning: structure matters. Brief Bioinform 2022; 23:bbab540. [PMID: 34929730 PMCID: PMC8790951 DOI: 10.1093/bib/bbab540] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/14/2021] [Accepted: 11/22/2021] [Indexed: 12/11/2022] Open
Abstract
Protein-RNA interactions are of vital importance to a variety of cellular activities. Both experimental and computational techniques have been developed to study the interactions. Because of the limitation of the previous database, especially the lack of protein structure data, most of the existing computational methods rely heavily on the sequence data, with only a small portion of the methods utilizing the structural information. Recently, AlphaFold has revolutionized the entire protein and biology field. Foreseeably, the protein-RNA interaction prediction will also be promoted significantly in the upcoming years. In this work, we give a thorough review of this field, surveying both the binding site and binding preference prediction problems and covering the commonly used datasets, features and models. We also point out the potential challenges and opportunities in this field. This survey summarizes the development of the RNA-binding protein-RNA interaction field in the past and foresees its future development in the post-AlphaFold era.
Collapse
Affiliation(s)
- Junkang Wei
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Siyuan Chen
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Licheng Zong
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC),
King Abdullah University of Science and Technology (KAUST),
23955-6900, Thuwal, Saudi Arabia
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese
University of Hong Kong (CUHK), 999077, Hong Kong SAR, China
- The CUHK Shenzhen Research Institute, Hi-Tech Park, 518057,
Shenzhen, China
| |
Collapse
|
7
|
Spiniello M, Scalf M, Casamassimi A, Abbondanza C, Smith LM. Towards an Ideal In Cell Hybridization-Based Strategy to Discover Protein Interactomes of Selected RNA Molecules. Int J Mol Sci 2022; 23:ijms23020942. [PMID: 35055128 PMCID: PMC8779001 DOI: 10.3390/ijms23020942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 01/11/2022] [Accepted: 01/13/2022] [Indexed: 02/04/2023] Open
Abstract
RNA-binding proteins are crucial to the function of coding and non-coding RNAs. The disruption of RNA–protein interactions is involved in many different pathological states. Several computational and experimental strategies have been developed to identify protein binders of selected RNA molecules. Amongst these, ‘in cell’ hybridization methods represent the gold standard in the field because they are designed to reveal the proteins bound to specific RNAs in a cellular context. Here, we compare the technical features of different ‘in cell’ hybridization approaches with a focus on their advantages, limitations, and current and potential future applications.
Collapse
Affiliation(s)
- Michele Spiniello
- Department of Precision Medicine, University of Campania Luigi Vanvitelli, 80138 Naples, Italy;
- Division of Immuno-Hematology and Transfusion Medicine, Cardarelli Hospital, 80131 Naples, Italy
- Correspondence: (M.S.); (A.C.)
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; (M.S.); (L.M.S.)
| | - Amelia Casamassimi
- Department of Precision Medicine, University of Campania Luigi Vanvitelli, 80138 Naples, Italy;
- Correspondence: (M.S.); (A.C.)
| | - Ciro Abbondanza
- Department of Precision Medicine, University of Campania Luigi Vanvitelli, 80138 Naples, Italy;
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; (M.S.); (L.M.S.)
| |
Collapse
|
8
|
Zhao D, Wang C, Yan S, Chen R. Advances in the identification of long non-coding RNA binding proteins. Anal Biochem 2021; 639:114520. [PMID: 34896376 DOI: 10.1016/j.ab.2021.114520] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/04/2021] [Accepted: 12/04/2021] [Indexed: 02/06/2023]
Abstract
Long non-coding RNAs (lncRNAs) are transcripts longer than 200 nt without evident protein coding function. They play important regulatory roles in many biological processes, e.g., gene regulation, chromatin remodeling, and cell fate determination during development. Dysregulation of lncRNAs has been observed in various diseases including cancer. Interacting with proteins is a crucial way for lncRNAs to play their biological roles. Therefore, the characterization of lncRNA binding proteins is important to understand their functions and to delineate the underlying molecular mechanism. Large-scale studies based on mass spectrometry have characterized over a thousand new RNA binding proteins without known RNA-binding domains, thus revealing the complexity and diversity of RNA-protein interactions. In addition, several methods have been developed to identify the binding proteins for particular RNAs of interest. Here we review the progress of the RNA-centric methods for the identification of RNA-protein interactions, focusing on the studies involving lncRNAs, and discuss their strengths and limitations.
Collapse
Affiliation(s)
- Dongqing Zhao
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, 300072, China
| | - Chunqing Wang
- The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, 250014, China
| | - Shuai Yan
- Peking University First Hospital, Peking University Health Science Center, Beijing, 100191, China
| | - Ruibing Chen
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
9
|
Zooming in on protein-RNA interactions: a multi-level workflow to identify interaction partners. Biochem Soc Trans 2021; 48:1529-1543. [PMID: 32820806 PMCID: PMC7458403 DOI: 10.1042/bst20191059] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 07/17/2020] [Accepted: 07/20/2020] [Indexed: 02/01/2023]
Abstract
Interactions between proteins and RNA are at the base of numerous cellular regulatory and functional phenomena. The investigation of the biological relevance of non-coding RNAs has led to the identification of numerous novel RNA-binding proteins (RBPs). However, defining the RNA sequences and structures that are selectively recognised by an RBP remains challenging, since these interactions can be transient and highly dynamic, and may be mediated by unstructured regions in the protein, as in the case of many non-canonical RBPs. Numerous experimental and computational methodologies have been developed to predict, identify and verify the binding between a given RBP and potential RNA partners, but navigating across the vast ocean of data can be frustrating and misleading. In this mini-review, we propose a workflow for the identification of the RNA binding partners of putative, newly identified RBPs. The large pool of potential binders selected by in-cell experiments can be enriched by in silico tools such as catRAPID, which is able to predict the RNA sequences more likely to interact with specific RBP regions with high accuracy. The RNA candidates with the highest potential can then be analysed in vitro to determine the binding strength and to precisely identify the binding sites. The results thus obtained can furthermore validate the computational predictions, offering an all-round solution to the issue of finding the most likely RNA binding partners for a newly identified potential RBP.
Collapse
|
10
|
Torkamanian-Afshar M, Lanjanian H, Nematzadeh S, Tabarzad M, Najafi A, Kiani F, Masoudi-Nejad A. RPINBASE: An online toolbox to extract features for predicting RNA-protein interactions. Genomics 2020; 112:2623-2632. [DOI: 10.1016/j.ygeno.2020.02.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 01/04/2020] [Accepted: 02/13/2020] [Indexed: 12/12/2022]
|
11
|
Shi Q, Chen W, Huang S, Wang Y, Xue Z. Deep learning for mining protein data. Brief Bioinform 2019; 22:194-218. [PMID: 31867611 DOI: 10.1093/bib/bbz156] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/21/2019] [Accepted: 11/07/2019] [Indexed: 01/16/2023] Open
Abstract
The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks, two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction, three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives on general deep learning techniques for protein data analysis.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering, Huazhong University of Science and Technology. His main interests cover machine learning especially deep learning, protein data analysis, and big data mining
| | - Weiya Chen
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, virtual reality, and data visualization
| | - Siqi Huang
- Software Engineering at Huazhong University of science and technology, focusing on Machine learning and data mining
| | - Yan Wang
- School of life, University of Science & Technology; her main interests cover protein structure and function prediction and big data mining
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, machine learning, and image processing
| |
Collapse
|
12
|
Jiang D, Armour CR, Hu C, Mei M, Tian C, Sharpton TJ, Jiang Y. Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities. Front Genet 2019; 10:995. [PMID: 31781153 PMCID: PMC6857202 DOI: 10.3389/fgene.2019.00995] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 09/18/2019] [Indexed: 12/21/2022] Open
Abstract
The advent of large-scale microbiome studies affords newfound analytical opportunities to understand how these communities of microbes operate and relate to their environment. However, the analytical methodology needed to model microbiome data and integrate them with other data constructs remains nascent. This emergent analytical toolset frequently ports over techniques developed in other multi-omics investigations, especially the growing array of statistical and computational techniques for integrating and representing data through networks. While network analysis has emerged as a powerful approach to modeling microbiome data, oftentimes by integrating these data with other types of omics data to discern their functional linkages, it is not always evident if the statistical details of the approach being applied are consistent with the assumptions of microbiome data or how they impact data interpretation. In this review, we overview some of the most important network methods for integrative analysis, with an emphasis on methods that have been applied or have great potential to be applied to the analysis of multi-omics integration of microbiome data. We compare advantages and disadvantages of various statistical tools, assess their applicability to microbiome data, and discuss their biological interpretability. We also highlight on-going statistical challenges and opportunities for integrative network analysis of microbiome data.
Collapse
Affiliation(s)
- Duo Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Courtney R Armour
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Chenxiao Hu
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Meng Mei
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Chuan Tian
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Thomas J Sharpton
- Department of Statistics, Oregon State University, Corvallis, OR, United States
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| |
Collapse
|