1
|
Narykov O, Johnson NT, Korkin D. Predicting protein interaction network perturbation by alternative splicing with semi-supervised learning. Cell Rep 2021; 37:110045. [PMID: 34818539 DOI: 10.1016/j.celrep.2021.110045] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 07/21/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022] Open
Abstract
Alternative splicing introduces an additional layer of protein diversity and complexity in regulating cellular functions that can be specific to the tissue and cell type, physiological state of a cell, or disease phenotype. Recent high-throughput experimental studies have illuminated the functional role of splicing events through rewiring protein-protein interactions; however, the extent to which the macromolecular interactions are affected by alternative splicing has yet to be fully understood. In silico methods provide a fast and cheap alternative to interrogating functional characteristics of thousands of alternatively spliced isoforms. Here, we develop an accurate feature-based machine learning approach that predicts whether a protein-protein interaction carried out by a reference isoform is perturbed by an alternatively spliced isoform. Our method, called the alternatively spliced interactions prediction (ALT-IN) tool, is compared with the state-of-the-art PPI prediction tools and shows superior performance, achieving 0.92 in precision and recall values.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Nathan T Johnson
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA; Harvard Program in Therapeutic Sciences, Harvard Medical School, and Breast Tumor Immunology Laboratory, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Dmitry Korkin
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA.
| |
Collapse
|
2
|
Petrunina O, Shevaga D, Babenko V, Pavlov V, Rysin S, Nastenko I. Comparative Analysis of Classification Algorithms in the Analysis of Medical Images From Speckle Tracking Echocardiography Video Data. INNOVATIVE BIOSYSTEMS AND BIOENGINEERING 2021. [DOI: 10.20535/ibb.2021.5.3.234990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Background. Machine learning allows applying various intelligent algorithms to produce diagnostic and/or prognostic models. Such models can be used to determine the functional state of the heart, which is diagnosed by speckle-tracking echocardiography. To determine the patient's heart condition in detail, a classification approach is used in machine learning. Each of the classification algorithms has a different performance when applied to certain situations. Therefore, the actual task is to determine the most efficient algorithm for solving a specific task of classifying the patient's heart condition when applying the same speckle-tracking echocardiography data set.
Objective. We are aimed to evaluate the effectiveness of the application of prognostic models of logistic regression, the group method of data handling (GMDH), random forest, and adaptive boosting (AdaBoost) in the construction of algorithms to support medical decision-making on the diagnosis of coronary heart disease.
Methods. Video data from speckle-tracking echocardiography of 40 patients with coronary heart disease and 16 patients without cardiac pathology were used for the study. Echocardiography was recorded in B-mode in three positions: long axis, 4-chamber, and 2-chamber. Echocardiography frames that reflect the systole and diastole of the heart (308 samples in total) were taken as objects for classification. To obtain informative features of the selected objects, the genetic GMDH approach was applied to identify the best structure of harmonic textural features. We compared the efficiency of the following classification algorithms: logistic regression method, GMDH classifier, random forest method, and AdaBoost method.
Results. Four classification models were constructed for each of the three B-mode echocardiography positions. For this purpose, the data samples were divided into 3: training sample (60%), validation sample (20%), and test sample (20%). Objective evaluation of the models on the test sample showed that the best classification method was random forest (90.3% accuracy on the 4-chamber echocardiography position, 74.2% on the 2-chamber, and 77.4% on the long axis). This was also confirmed by ROC analysis, wherein in all cases, the random forest was the most effective in classifying cardiac conditions.
Conclusions. The best classification algorithm for cardiac diagnostics by speckle-tracking echocardiography was determined. It turned out to be a random forest, which can be explained by the ensemble approach of begging, which is inherent in this classification method. It will be the mainstay of further research, which is planned to be performed to develop a full-fledged decision support system for cardiac diagnostics.
Collapse
|
3
|
Liang Y, Wang Y, Ma L, Zhong Z, Yang X, Tao X, Chen X, He Z, Yang Y, Zeng K, Kang R, Gong J, Ying S, Lei Y, Pang J, Lv X, Gu Y. Comparison of microRNAs in adipose and muscle tissue from seven indigenous Chinese breeds and Yorkshire pigs. Anim Genet 2019; 50:439-448. [PMID: 31328299 DOI: 10.1111/age.12826] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2019] [Indexed: 01/29/2023]
Abstract
Elucidation of the pig microRNAome is essential for interpreting functional elements of the genome and understanding the genetic architecture of complex traits. Here, we extracted small RNAs from skeletal muscle and adipose tissue, and we compared their expression levels between one Western breed (Yorkshire) and seven indigenous Chinese breeds. We detected the expression of 172 known porcine microRNAs (miRNAs) and 181 novel miRNAs. Differential expression analysis found 92 and 12 differentially expressed miRNAs in adipose and muscle tissue respectively. We found that different Chinese breeds shared common directional miRNA expression changes compared to Yorkshire pigs. Some miRNAs differentially expressed across multiple Chinese breeds, including ssc-miR-129-5p, ssc-miR-30 and ssc-miR-150, are involved in adipose tissue function. Functional enrichment analysis revealed that the target genes of the differentially expressed miRNAs are associated mainly with signaling pathways rather than metabolic and biosynthetic processes. The miRNA-target gene and miRNA-phenotypic traits networks identified many hub miRNAs that regulate a large number of target genes or phenotypic traits. Specifically, we found that intramuscular fat content is regulated by the greatest number of miRNAs in muscle tissue. This study provides valuable new candidate miRNAs that will aid in the improvement of meat quality and production.
Collapse
Affiliation(s)
- Y Liang
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - Y Wang
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - L Ma
- Institute of Blood Transfusion, Chinese Academy of Medical Sciences, Chengdu, 610052, Sichuan Province China
| | - Z Zhong
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - X Yang
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - X Tao
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - X Chen
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - Z He
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - Y Yang
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - K Zeng
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - R Kang
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - J Gong
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - S Ying
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - Y Lei
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - J Pang
- Chengdu Biotechservice Institute, Chengdu, 610041, Sichuan Province China
| | - X Lv
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| | - Y Gu
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, 610066, Sichuan Province China
| |
Collapse
|
4
|
Khan A, Shah S, Wahid F, Khan FG, Jabeen S. Identification of microRNA precursors using reduced and hybrid features. MOLECULAR BIOSYSTEMS 2018; 13:1640-1645. [PMID: 28686281 DOI: 10.1039/c7mb00115k] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
MicroRNAs (also called miRNAs) are a group of short non-coding RNA molecules. They play a vital role in the gene expression of transcriptional and post-transcriptional processes. However, abnormality of their expression has been observed in cancer, heart diseases and nervous system disorders. Therefore for basic research and microRNA based therapy, it is imperative to separate real pre-miRNAs from false ones (hairpin sequences similar to pre-miRNA stem loops). Different conservation and machine learning methods have been applied for the identification of miRNAs. However, machine learning algorithms have gained more popularity than conservative based algorithms in terms of sensitivity and overall performance. Due to the avalanche of RNA sequences discovered in a post-genomic age, it is necessary to construct a predictor for the identification of pre-microRNAs in humans. We have developed a predictor called MicroR-Pred in which the RNA sequences are formulated by a hybrid feature vector. The novelty of the new predictor is in the use of the partial least squares technique followed by the Random Forest and SVM (Support Vector Machine) algorithms for dimension reduction and classification. The performance of the MicroR-Pred model is quite promising compared to other state-of-the-art miRNA predictors. It has achieved 88.40% and 93.90% accuracies for RF and SVM.
Collapse
Affiliation(s)
- Asad Khan
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Sajid Shah
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Fazli Wahid
- Department of Environmental Sciences COMSATS Institute of IT, Abbottabad 22060, Pakistan
| | - Fiaz Gul Khan
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Saima Jabeen
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| |
Collapse
|
5
|
MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach. BIOMED RESEARCH INTERNATIONAL 2017; 2017:6261802. [PMID: 28243601 PMCID: PMC5294223 DOI: 10.1155/2017/6261802] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Revised: 11/14/2016] [Accepted: 12/13/2016] [Indexed: 12/15/2022]
Abstract
Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.
Collapse
|
6
|
Peace RJ, Biggar KK, Storey KB, Green JR. A framework for improving microRNA prediction in non-human genomes. Nucleic Acids Res 2015; 43:e138. [PMID: 26163062 PMCID: PMC4787757 DOI: 10.1093/nar/gkv698] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Accepted: 06/28/2015] [Indexed: 11/12/2022] Open
Abstract
The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio of true miRNA sequences to pseudo-miRNA sequences is on the order of 1:1000, such low specificity prevents the application of most existing tools to non-human genomes, as the number of false positives overwhelms the true predictions. We here introduce a framework (SMIRP) for creating species-specific miRNA prediction systems, leveraging sequence conservation and phylogenetic distance information. Substantial improvements in specificity and precision are obtained for four non-human test species when our framework is applied to three different prediction systems representing two types of classifiers (support vector machine and Random Forest), based on three different feature sets, with both human-specific and taxon-wide training data. The SMIRP framework is potentially applicable to all miRNA prediction systems and we expect substantial improvement in precision and specificity, while sustaining sensitivity, independent of the machine learning technique chosen.
Collapse
Affiliation(s)
- Robert J Peace
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| | - Kyle K Biggar
- Institute of Biochemistry and Department of Biology, Carleton University, Ottawa, Canada Department of Biochemistry, University of Western Ontario, London, Canada
| | - Kenneth B Storey
- Institute of Biochemistry and Department of Biology, Carleton University, Ottawa, Canada
| | - James R Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
| |
Collapse
|