1
|
Hruba P, Klema J, Le AV, Girmanova E, Mrazova P, Massart A, Maixnerova D, Voska L, Piredda GB, Biancone L, Puga AR, Seyahi N, Sever MS, Weekers L, Muhfeld A, Budde K, Watschinger B, Miglinas M, Zahradka I, Abramowicz M, Abramowicz D, Viklicky O. Novel transcriptomic signatures associated with premature kidney allograft failure. EBioMedicine 2023; 96:104782. [PMID: 37660534 PMCID: PMC10480056 DOI: 10.1016/j.ebiom.2023.104782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/18/2023] [Accepted: 08/18/2023] [Indexed: 09/05/2023] Open
Abstract
BACKGROUND The power to predict kidney allograft outcomes based on non-invasive assays is limited. Assessment of operational tolerance (OT) patients allows us to identify transcriptomic signatures of true non-responders for construction of predictive models. METHODS In this observational retrospective study, RNA sequencing of peripheral blood was used in a derivation cohort to identify a protective set of transcripts by comparing 15 OT patients (40% females), from the TOMOGRAM Study (NCT05124444), 14 chronic active antibody-mediated rejection (CABMR) and 23 stable graft function patients ≥15 years (STA). The selected differentially expressed transcripts between OT and CABMR were used in a validation cohort (n = 396) to predict 3-year kidney allograft loss at 3 time-points using RT-qPCR. FINDINGS Archetypal analysis and classifier performance of RNA sequencing data showed that OT is clearly distinguishable from CABMR, but similar to STA. Based on significant transcripts from the validation cohort in univariable analysis, 2 multivariable Cox models were created. A 3-transcript (ADGRG3, ATG2A, and GNLY) model from POD 7 predicted graft loss with C-statistics (C) 0.727 (95% CI, 0.638-0.820). Another 3-transcript (IGHM, CD5, GNLY) model from M3 predicted graft loss with C 0.786 (95% CI, 0.785-0.865). Combining 3-transcripts models with eGFR at POD 7 and M3 improved C-statistics to 0.860 (95% CI, 0.778-0.944) and 0.868 (95% CI, 0.790-0.944), respectively. INTERPRETATION Identification of transcripts distinguishing OT from CABMR allowed us to construct models predicting premature graft loss. Identified transcripts reflect mechanisms of injury/repair and alloimmune response when assessed at day 7 or with a loss of protective phenotype when assessed at month 3. FUNDING Supported by the Ministry of Health of the Czech Republic under grant NV19-06-00031.
Collapse
Affiliation(s)
- Petra Hruba
- Transplant Laboratory, Institute for Clinical and Experimental Medicine, Prague, Czech Republic
| | - Jiri Klema
- Department of Computer Science, Czech Technical University, Prague, Czech Republic
| | - Anh Vu Le
- Department of Computer Science, Czech Technical University, Prague, Czech Republic
| | - Eva Girmanova
- Transplant Laboratory, Institute for Clinical and Experimental Medicine, Prague, Czech Republic
| | - Petra Mrazova
- Transplant Laboratory, Institute for Clinical and Experimental Medicine, Prague, Czech Republic
| | - Annick Massart
- Antwerp University Hospital and Antwerp University, Antwerp, Belgium
| | - Dita Maixnerova
- Department of Nephrology, 1st Faculty of Medicine and General Faculty Hospital, Prague, Czech Republic
| | - Ludek Voska
- Department of Clinical and Transplant Pathology, Institute for Clinical and Experimental Medicine, Prague, Czech Republic
| | - Gian Benedetto Piredda
- Department of Kidney Disease Medicine of Renal Transplantation, G.Brotzu Hospital Cagliari, Italy
| | - Luigi Biancone
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Ana Ramirez Puga
- Hospital Universitario Insular de Gran Canaria, Servicio de nefrología, Spain
| | - Nurhan Seyahi
- Istanbul University, Cerrahpasa Medical Faculty, Nephrology, Istanbul, Turkey
| | - Mehmet Sukru Sever
- Istanbul University, Istanbul School of Medicine, Internal Medicine, Nephrology, Istanbul, Turkey
| | | | - Anja Muhfeld
- Department of Nephrology, Uniklinik RWTH Aachen, Aachen, Germany
| | - Klemens Budde
- Charité - Universitätsmedizin Berlin, Medizinische Klinik mit Schwerpunkt Nephrologie und Internistische Intensivmedizin, Berlin, Germany
| | - Bruno Watschinger
- Department of Internal Medicine III, Nephrology, Medical University Vienna / AKH Wien, Vienna, Austria
| | - Marius Miglinas
- Faculty of Medicine, Nephrology Center, Vilnius University Hospital Santaros Klinikos, Vilnius University, Vilnius, Lithuania
| | - Ivan Zahradka
- Department of Nephrology, Institute for Clinical and Experimental Medicine, Prague, Czech Republic
| | - Marc Abramowicz
- Genetic Medicine and Development, Faculty of Medicine, University of Geneva, Rue Michel Servet 1, 1206 Geneva, Switzerland
| | - Daniel Abramowicz
- Antwerp University Hospital and Antwerp University, Antwerp, Belgium
| | - Ondrej Viklicky
- Transplant Laboratory, Institute for Clinical and Experimental Medicine, Prague, Czech Republic; Department of Nephrology, Institute for Clinical and Experimental Medicine, Prague, Czech Republic.
| |
Collapse
|
2
|
Nesvaderani M, Dhillon BK, Chew T, Tang B, Baghela A, Hancock RE, Eslick GD, Cox M. Gene Expression Profiling: Identification of Novel Pathways and Potential Biomarkers in Severe Acute Pancreatitis. J Am Coll Surg 2022; 234:803-815. [PMID: 35426393 DOI: 10.1097/xcs.0000000000000115] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
BACKGROUND Determining the risk of developing severe acute pancreatitis (AP) on presentation to hospital is difficult but vital to enable early management decisions that reduce morbidity and mortality. The objective of this study was to determine global gene expression profiles of patients with different acute pancreatitis severity to identify genes and molecular mechanisms involved in the pathogenesis of severe AP. STUDY DESIGN AP patients (n = 87) were recruited within 24 hours of admission to the Emergency Department and were confirmed to exhibit at least 2 of the following features: (1) abdominal pain characteristic of AP, (2) serum amylase and/or lipase more than 3-fold the upper laboratory limit considered normal, and/or (3) radiographically demonstrated AP on CT scan. Severity was defined according to the Revised Atlanta classification. Thirty-two healthy volunteers were also recruited and peripheral venous blood was collected for performing RNA-Seq. RESULTS In severe AP, 422 genes (185 upregulated, 237 downregulated) were significantly differentially expressed when compared with moderately severe and mild cases. Pathway analysis revealed changes in specific innate and adaptive immune, sepsis-related, and surface modification pathways in severe AP. Data-driven approaches revealed distinct gene expression groups (endotypes), which were not entirely overlapping with the clinical Atlanta classification. Importantly, severe and moderately severe AP patients clustered away from healthy controls, whereas mild AP patients did not exhibit any clear separation, suggesting distinct underlying mechanisms that may influence severity of AP. CONCLUSION There were significant differences in gene expression affecting the severity of AP, revealing a central role of specific immunological pathways. Despite the existence of patient endotypes, a 4-gene transcriptomic signature (S100A8, S100A9, MMP25, and MT-ND4L) was determined that can predict severe AP with an accuracy of 64%.
Collapse
Affiliation(s)
- Maryam Nesvaderani
- From the Department of Surgery, The Centre for Evidence Based Surgery (Nesvaderani, Eslick, Cox), University of Sydney Nepean Clinical School, Nepean Hospital, Sydney, Australia
| | - Bhavjinder K Dhillon
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, Canada (Dhillon, Baghela, Hancock)
| | - Tracy Chew
- Intensive Care Medicine (Chew, Tang), University of Sydney Nepean Clinical School, Nepean Hospital, Sydney, Australia
- Sydney Informatics Hub, University of Sydney, Sydney, Australia (Chew)
| | - Benjamin Tang
- Intensive Care Medicine (Chew, Tang), University of Sydney Nepean Clinical School, Nepean Hospital, Sydney, Australia
| | - Arjun Baghela
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, Canada (Dhillon, Baghela, Hancock)
| | - Robert Ew Hancock
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, Canada (Dhillon, Baghela, Hancock)
| | - Guy D Eslick
- From the Department of Surgery, The Centre for Evidence Based Surgery (Nesvaderani, Eslick, Cox), University of Sydney Nepean Clinical School, Nepean Hospital, Sydney, Australia
| | - Michael Cox
- From the Department of Surgery, The Centre for Evidence Based Surgery (Nesvaderani, Eslick, Cox), University of Sydney Nepean Clinical School, Nepean Hospital, Sydney, Australia
| |
Collapse
|
3
|
Merkerova MD, Klema J, Kundrat D, Szikszai K, Krejcik Z, Hrustincova A, Trsova I, LE AV, Cermak J, Jonasova A, Belickova M. Noncoding RNAs and Their Response Predictive Value in Azacitidine-treated Patients With Myelodysplastic Syndrome and Acute Myeloid Leukemia With Myelodysplasia-related Changes. Cancer Genomics Proteomics 2022; 19:205-228. [PMID: 35181589 DOI: 10.21873/cgp.20315] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 01/07/2022] [Accepted: 01/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND/AIM Prediction of response to azacitidine (AZA) treatment is an important challenge in hematooncology. In addition to protein coding genes (PCGs), AZA efficiency is influenced by various noncoding RNAs (ncRNAs), including long ncRNAs (lncRNAs), circular RNAs (circRNAs), and transposable elements (TEs). MATERIALS AND METHODS RNA sequencing was performed in patients with myelodysplastic syndromes or acute myeloid leukemia before AZA treatment to assess contribution of ncRNAs to AZA mechanisms and propose novel disease prediction biomarkers. RESULTS Our analyses showed that lncRNAs had the strongest predictive potential. The combined set of the best predictors included 14 lncRNAs, and only four PCGs, one circRNA, and no TEs. Epigenetic regulation and recombinational repair were suggested as crucial for AZA response, and network modeling defined three deregulated lncRNAs (CTC-482H14.5, RP11-419K12.2, and RP11-736I24.4) associated with these processes. CONCLUSION The expression of various ncRNAs can influence the effect of AZA and new ncRNA-based predictive biomarkers can be defined.
Collapse
Affiliation(s)
| | - Jiri Klema
- Department of Computer Sciences, Czech Technical University, Prague, Czech Republic
| | - David Kundrat
- Department of Genomics, Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| | - Katarina Szikszai
- Department of Genomics, Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| | - Zdenek Krejcik
- Department of Genomics, Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| | - Andrea Hrustincova
- Department of Genomics, Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| | - Iva Trsova
- Department of Genomics, Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| | - Anh Vu LE
- Department of Computer Sciences, Czech Technical University, Prague, Czech Republic
| | - Jaroslav Cermak
- Laboratory of Anemias, Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| | - Anna Jonasova
- First Department of Medicine, General University Hospital, Prague, Czech Republic
| | - Monika Belickova
- Department of Genomics, Institute of Hematology and Blood Transfusion, Prague, Czech Republic
| |
Collapse
|
4
|
Scott MA, Woolums AR, Swiderski CE, Perkins AD, Nanduri B. Genes and regulatory mechanisms associated with experimentally-induced bovine respiratory disease identified using supervised machine learning methodology. Sci Rep 2021; 11:22916. [PMID: 34824337 PMCID: PMC8616896 DOI: 10.1038/s41598-021-02343-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 11/08/2021] [Indexed: 11/28/2022] Open
Abstract
Bovine respiratory disease (BRD) is a multifactorial disease involving complex host immune interactions shaped by pathogenic agents and environmental factors. Advancements in RNA sequencing and associated analytical methods are improving our understanding of host response related to BRD pathophysiology. Supervised machine learning (ML) approaches present one such method for analyzing new and previously published transcriptome data to identify novel disease-associated genes and mechanisms. Our objective was to apply ML models to lung and immunological tissue datasets acquired from previous clinical BRD experiments to identify genes that classify disease with high accuracy. Raw mRNA sequencing reads from 151 bovine datasets (n = 123 BRD, n = 28 control) were downloaded from NCBI-GEO. Quality filtered reads were assembled in a HISAT2/Stringtie2 pipeline. Raw gene counts for ML analysis were normalized, transformed, and analyzed with MLSeq, utilizing six ML models. Cross-validation parameters (fivefold, repeated 10 times) were applied to 70% of the compiled datasets for ML model training and parameter tuning; optimized ML models were tested with the remaining 30%. Downstream analysis of significant genes identified by the top ML models, based on classification accuracy for each etiological association, was performed within WebGestalt and Reactome (FDR ≤ 0.05). Nearest shrunken centroid and Poisson linear discriminant analysis with power transformation models identified 154 and 195 significant genes for IBR and BRSV, respectively; from these genes, the two ML models discriminated IBR and BRSV with 100% accuracy compared to sham controls. Significant genes classified by the top ML models in IBR (154) and BRSV (195), but not BVDV (74), were related to type I interferon production and IL-8 secretion, specifically in lymphoid tissue and not homogenized lung tissue. Genes identified in Mannheimia haemolytica infections (97) were involved in activating classical and alternative pathways of complement. Novel findings, including expression of genes related to reduced mitochondrial oxygenation and ATP synthesis in consolidated lung tissue, were discovered. Genes identified in each analysis represent distinct genomic events relevant to understanding and predicting clinical BRD. Our analysis demonstrates the utility of ML with published datasets for discovering functional information to support the prediction and understanding of clinical BRD.
Collapse
Affiliation(s)
- Matthew A Scott
- Veterinary Education, Research, and Outreach Center, Texas A&M University and West Texas A&M University, Canyon, TX, USA.
| | - Amelia R Woolums
- Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, USA
| | - Cyprianna E Swiderski
- Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, USA
| | - Andy D Perkins
- Department of Computer Science and Engineering, Mississippi State University, Mississippi State, MS, USA
| | - Bindu Nanduri
- Department of Comparative Biomedical Sciences, Mississippi State University, Mississippi State, MS, USA
| |
Collapse
|
5
|
Cellular, molecular, and therapeutic characterization of pilocarpine-induced temporal lobe epilepsy. Sci Rep 2021; 11:19102. [PMID: 34580351 PMCID: PMC8476594 DOI: 10.1038/s41598-021-98534-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 09/09/2021] [Indexed: 12/30/2022] Open
Abstract
Animal models have expanded our understanding of temporal lobe epilepsy (TLE). However, translating these to cell-specific druggable hypotheses is not explored. Herein, we conducted an integrative insilico-analysis of an available transcriptomics dataset obtained from animals with pilocarpine-induced-TLE. A set of 119 genes with subtle-to-moderate impact predicted most forms of epilepsy with ~ 97% accuracy and characteristically mapped to upregulated homeostatic and downregulated synaptic pathways. The deconvolution of cellular proportions revealed opposing changes in diverse cell types. The proportion of nonneuronal cells increased whereas that of interneurons, except for those expressing vasoactive intestinal peptide (Vip), decreased, and pyramidal neurons of the cornu-ammonis (CA) subfields showed the highest variation in proportion. A probabilistic Bayesian-network demonstrated an aberrant and oscillating physiological interaction between nonneuronal cells involved in the blood–brain-barrier and Vip interneurons in driving seizures, and their role was evaluated insilico using transcriptomic changes induced by valproic-acid, which showed opposing effects in the two cell-types. Additionally, we revealed novel epileptic and antiepileptic mechanisms and predicted drugs using causal inference, outperforming the present drug repurposing approaches. These well-powered findings not only expand the understanding of TLE and seizure oscillation, but also provide predictive biomarkers of epilepsy, cellular and causal micro-circuitry changes associated with it, and a drug-discovery method focusing on these events.
Collapse
|
6
|
Shboul ZA, Diawara N, Vossough A, Chen JY, Iftekharuddin KM. Joint Modeling of RNAseq and Radiomics Data for Glioma Molecular Characterization and Prediction. Front Med (Lausanne) 2021; 8:705071. [PMID: 34490297 PMCID: PMC8416908 DOI: 10.3389/fmed.2021.705071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 07/20/2021] [Indexed: 12/13/2022] Open
Abstract
RNA sequencing (RNAseq) is a recent technology that profiles gene expression by measuring the relative frequency of the RNAseq reads. RNAseq read counts data is increasingly used in oncologic care and while radiology features (radiomics) have also been gaining utility in radiology practice such as disease diagnosis, monitoring, and treatment planning. However, contemporary literature lacks appropriate RNA-radiomics (henceforth, radiogenomics ) joint modeling where RNAseq distribution is adaptive and also preserves the nature of RNAseq read counts data for glioma grading and prediction. The Negative Binomial (NB) distribution may be useful to model RNAseq read counts data that addresses potential shortcomings. In this study, we propose a novel radiogenomics-NB model for glioma grading and prediction. Our radiogenomics-NB model is developed based on differentially expressed RNAseq and selected radiomics/volumetric features which characterize tumor volume and sub-regions. The NB distribution is fitted to RNAseq counts data, and a log-linear regression model is assumed to link between the estimated NB mean and radiomics. Three radiogenomics-NB molecular mutation models (e.g., IDH mutation, 1p/19q codeletion, and ATRX mutation) are investigated. Additionally, we explore gender-specific effects on the radiogenomics-NB models. Finally, we compare the performance of the proposed three mutation prediction radiogenomics-NB models with different well-known methods in the literature: Negative Binomial Linear Discriminant Analysis (NBLDA), differentially expressed RNAseq with Random Forest (RF-genomics), radiomics and differentially expressed RNAseq with Random Forest (RF-radiogenomics), and Voom-based count transformation combined with the nearest shrinkage classifier (VoomNSC). Our analysis shows that the proposed radiogenomics-NB model significantly outperforms (ANOVA test, p < 0.05) for prediction of IDH and ATRX mutations and offers similar performance for prediction of 1p/19q codeletion, when compared to the competing models in the literature, respectively.
Collapse
Affiliation(s)
- Zeina A. Shboul
- Vision Lab, Department of Electrical & Computer Engineering, Old Dominion University, Norfolk, VA, United States
| | - Norou Diawara
- Department of Mathematics & Statistics, Old Dominion University, Norfolk, VA, United States
| | - Arastoo Vossough
- Department of Radiology, Children's Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA, United States
| | - James Y. Chen
- University of California, San Diego Health System, San Diego, CA, United States
| | - Khan M. Iftekharuddin
- Vision Lab, Department of Electrical & Computer Engineering, Old Dominion University, Norfolk, VA, United States
| |
Collapse
|
7
|
Koçhan N, Tutuncu GY, Smyth GK, Gandolfo LC, Giner G. qtQDA: quantile transformed quadratic discriminant analysis for high-dimensional RNA-seq data. PeerJ 2020; 7:e8260. [PMID: 31976167 PMCID: PMC6967023 DOI: 10.7717/peerj.8260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 11/20/2019] [Indexed: 11/26/2022] Open
Abstract
Classification on the basis of gene expression data derived from RNA-seq promises to become an important part of modern medicine. We propose a new classification method based on a model where the data is marginally negative binomial but dependent, thereby incorporating the dependence known to be present between measurements from different genes. The method, called qtQDA, works by first performing a quantile transformation (qt) then applying Gaussian quadratic discriminant analysis (QDA) using regularized covariance matrix estimates. We show that qtQDA has excellent performance when applied to real data sets and has advantages over some existing approaches. An R package implementing the method is also available on https://github.com/goknurginer/qtQDA.
Collapse
Affiliation(s)
- Necla Koçhan
- Department of Mathematics, Izmir University of Economics, Izmir, Turkey
| | - G Yazgi Tutuncu
- Department of Mathematics, Izmir University of Economics, Izmir, Turkey
| | - Gordon K Smyth
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia.,School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Luke C Gandolfo
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia.,School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Göknur Giner
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
8
|
Goksuluk D, Zararsiz G, Korkmaz S, Eldem V, Zararsiz GE, Ozcetin E, Ozturk A, Karaagaoglu AE. MLSeq: Machine learning interface for RNA-sequencing data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 175:223-231. [PMID: 31104710 DOI: 10.1016/j.cmpb.2019.04.007] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 03/21/2019] [Accepted: 04/08/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND AND OBJECTIVE In the last decade, RNA-sequencing technology has become method-of-choice and prefered to microarray technology for gene expression based classification and differential expression analysis since it produces less noisy data. Although there are many algorithms proposed for microarray data, the number of available algorithms and programs are limited for classification of RNA-sequencing data. For this reason, we developed MLSeq, to bring not only frequently used classification algorithms but also novel approaches together and make them available to be used for classification of RNA sequencing data. This package is developed using R language environment and distributed through BIOCONDUCTOR network. METHODS Classification of RNA-sequencing data is not straightforward since raw data should be preprocessed before downstream analysis. With MLSeq package, researchers can easily preprocess (normalization, filtering, transformation etc.) and classify raw RNA-sequencing data using two strategies: (i) to perform algorithms which are directly proposed for RNA-sequencing data structure or (ii) to transform RNA-sequencing data in order to bring it distributionally closer to microarray data structure, and perform algorithms which are developed for microarray data. Moreover, we proposed novel algorithms such as voom (an acronym for variance modelling at observational level) based nearest shrunken centroids (voomNSC), diagonal linear discriminant analysis (voomDLDA), etc. through MLSeq. MATERIALS Three real RNA-sequencing datasets (i.e cervical cancer, lung cancer and aging datasets) were used to evalute model performances. Poisson linear discriminant analysis (PLDA) and negative binomial linear discriminant analysis (NBLDA) were selected as algorithms based on dicrete distributions, and voomNSC, nearest shrunken centroids (NSC) and support vector machines (SVM) were selected as algorithms based on continuous distributions for model comparisons. Each algorithm is compared using classification accuracies and sparsities on an independent test set. RESULTS The algorithms which are based on discrete distributions performed better in cervical cancer and aging data with accuracies above 0.92. In lung cancer data, the most of algorithms performed similar with accuracies of 0.88 except that SVM achieved 0.94 of accuracy. Our voomNSC algorithm was the most sparse algorithm, and able to select 2.2% and 6.6% of all features for cervical cancer and lung cancer datasets respectively. However, in aging data, sparse classifiers were not able to select an optimal subset of all features. CONCLUSION MLSeq is comprehensive and easy-to-use interface for classification of gene expression data. It allows researchers perform both preprocessing and classification tasks through single platform. With this property, MLSeq can be considered as a pipeline for the classification of RNA-sequencing data.
Collapse
Affiliation(s)
- Dincer Goksuluk
- Department of Biostatistics, School of Medicine, Hacettepe University, 06100, Ankara, Turkey; Turcosa Analytics Solutions Ltd. Co., Erciyes Teknopark 5, 38030, Kayseri, Turkey
| | - Gokmen Zararsiz
- Department of Biostatistics, School of Medicine, Erciyes University, 38030, Kayseri, Turkey; Turcosa Analytics Solutions Ltd. Co., Erciyes Teknopark 5, 38030, Kayseri, Turkey.
| | - Selcuk Korkmaz
- Department of Biostatistics, School of Medicine, Trakya University, 22030, Edirne, Turkey; Turcosa Analytics Solutions Ltd. Co., Erciyes Teknopark 5, 38030, Kayseri, Turkey
| | - Vahap Eldem
- Department of Biology, Faculty of Science, Istanbul University, 34452, Istanbul, Turkey
| | - Gozde Erturk Zararsiz
- Department of Biostatistics, School of Medicine, Erciyes University, 38030, Kayseri, Turkey
| | - Erdener Ozcetin
- Department of Industrial Engineering, Faculty of Engineering, Hitit University, 19030, Corum, Turkey
| | - Ahmet Ozturk
- Department of Biostatistics, School of Medicine, Erciyes University, 38030, Kayseri, Turkey; Turcosa Analytics Solutions Ltd. Co., Erciyes Teknopark 5, 38030, Kayseri, Turkey
| | - Ahmet Ergun Karaagaoglu
- Department of Biostatistics, School of Medicine, Hacettepe University, 06100, Ankara, Turkey
| |
Collapse
|