1
|
Jeyananthan P. Performance comparison between multi-level gene expression data in cancer subgroup classification. Pathol Res Pract 2024; 260:155419. [PMID: 38955118 DOI: 10.1016/j.prp.2024.155419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/06/2024] [Accepted: 06/19/2024] [Indexed: 07/04/2024]
Abstract
Cancer is a serious disease that can affect various parts of the body such as breast, colon, lung or stomach. Each of these cancers has their own treatment dependent historical subgroups. Hence, the correct identification of cancer subgroup has almost same importance as the timely diagnosis of cancer. This is still a challenging task and a system with highest accuracy is essential. Current researches are moving towards analyzing the gene expression data of cancer patients for various purposes including biomarker identification and studying differently expressed genes, using gene expression data measured in a single level (selected from different gene levels including genome, transcriptome or translation). However, previous studies showed that information carried by one level of gene expression is not similar to another level. This shows the importance of integrating multi-level omics data in these studies. Hence, this study uses tumor gene expression data measured from various levels of gene along with the integration of those data in the subgroup classification of nine different cancers. This is a comprehensive analysis where four different gene expression data such as transcriptome, miRNA, methylation and proteome are used in this subgrouping and the performances between models are compared to reveal the best model.
Collapse
|
2
|
Maraslioglu-Sperber A, Pizzi E, Fisch JO, Kattler K, Ritter T, Friauf E. Molecular and functional profiling of cell diversity and identity in the lateral superior olive, an auditory brainstem center with ascending and descending projections. Front Cell Neurosci 2024; 18:1354520. [PMID: 38846638 PMCID: PMC11153811 DOI: 10.3389/fncel.2024.1354520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 03/15/2024] [Indexed: 06/09/2024] Open
Abstract
The lateral superior olive (LSO), a prominent integration center in the auditory brainstem, contains a remarkably heterogeneous population of neurons. Ascending neurons, predominantly principal neurons (pLSOs), process interaural level differences for sound localization. Descending neurons (lateral olivocochlear neurons, LOCs) provide feedback into the cochlea and are thought to protect against acoustic overload. The molecular determinants of the neuronal diversity in the LSO are largely unknown. Here, we used patch-seq analysis in mice at postnatal days P10-12 to classify developing LSO neurons according to their functional and molecular profiles. Across the entire sample (n = 86 neurons), genes involved in ATP synthesis were particularly highly expressed, confirming the energy expenditure of auditory neurons. Two clusters were identified, pLSOs and LOCs. They were distinguished by 353 differentially expressed genes (DEGs), most of which were novel for the LSO. Electrophysiological analysis confirmed the transcriptomic clustering. We focused on genes affecting neuronal input-output properties and validated some of them by immunohistochemistry, electrophysiology, and pharmacology. These genes encode proteins such as osteopontin, Kv11.3, and Kvβ3 (pLSO-specific), calcitonin-gene-related peptide (LOC-specific), or Kv7.2 and Kv7.3 (no DEGs). We identified 12 "Super DEGs" and 12 genes showing "Cluster similarity." Collectively, we provide fundamental and comprehensive insights into the molecular composition of individual ascending and descending neurons in the juvenile auditory brainstem and how this may relate to their specific functions, including developmental aspects.
Collapse
Affiliation(s)
- Ayse Maraslioglu-Sperber
- Animal Physiology Group, Department of Biology, University of Kaiserslautern-Landau, Kaiserslautern, Germany
| | - Erika Pizzi
- Animal Physiology Group, Department of Biology, University of Kaiserslautern-Landau, Kaiserslautern, Germany
| | - Jonas O. Fisch
- Animal Physiology Group, Department of Biology, University of Kaiserslautern-Landau, Kaiserslautern, Germany
| | - Kathrin Kattler
- Genetics/Epigenetics Group, Department of Biological Sciences, Saarland University, Saarbrücken, Germany
| | - Tamara Ritter
- Animal Physiology Group, Department of Biology, University of Kaiserslautern-Landau, Kaiserslautern, Germany
| | - Eckhard Friauf
- Animal Physiology Group, Department of Biology, University of Kaiserslautern-Landau, Kaiserslautern, Germany
| |
Collapse
|
3
|
Matsui Y, Abe Y, Uno K, Miyano S. RoDiCE: robust differential protein co-expression analysis for cancer complexome. Bioinformatics 2022; 38:1269-1276. [PMID: 34529752 DOI: 10.1093/bioinformatics/btab612] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 08/09/2021] [Accepted: 08/23/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION The full spectrum of abnormalities in cancer-associated protein complexes remains largely unknown. Comparing the co-expression structure of each protein complex between tumor and healthy cells may provide insights regarding cancer-specific protein dysfunction. However, the technical limitations of mass spectrometry-based proteomics, including contamination with biological protein variants, causes noise that leads to non-negligible over- (or under-) estimating co-expression. RESULTS We propose a robust algorithm for identifying protein complex aberrations in cancer based on differential protein co-expression testing. Our method based on a copula is sufficient for improving identification accuracy with noisy data compared to conventional linear correlation-based approaches. As an application, we use large-scale proteomic data from renal cancer to show that important protein complexes, regulatory signaling pathways and drug targets can be identified. The proposed approach surpasses traditional linear correlations to provide insights into higher-order differential co-expression structures. AVAILABILITY AND IMPLEMENTATION https://github.com/ymatts/RoDiCE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yusuke Matsui
- Biomedical and Health Informatics Unit, Department of Integrated Health Science, Nagoya University Graduate School of Medicine, 461-8673 Nagoya, Aichi, Japan.,Institute for Glyco-core Research (iGCORE), Nagoya University, 461-8673 Nagoya, Aichi, Japan
| | - Yuichi Abe
- Division of Molecular Diagnostics, Aichi Cancer Center Research Institute, 464-0021 Nagoya, Aichi, Japan
| | - Kohei Uno
- Biomedical and Health Informatics Unit, Department of Integrated Health Science, Nagoya University Graduate School of Medicine, 461-8673 Nagoya, Aichi, Japan
| | - Satoru Miyano
- Department of Integrated Data Science, M&D Data Science Center, Tokyo Medical and Dental University, 113-8510 Tokyo, Japan
| |
Collapse
|
4
|
Gondeau A, Aouabed Z, Hijri M, Peres-Neto P, Makarenkov V. Object Weighting: A New Clustering Approach to Deal with Outliers and Cluster Overlap in Computational Biology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:633-643. [PMID: 31180868 PMCID: PMC8158064 DOI: 10.1109/tcbb.2019.2921577] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Considerable efforts have been made over the last decades to improve the robustness of clustering algorithms against noise features and outliers, known to be important sources of error in clustering. Outliers dominate the sum-of-the-squares calculations and generate cluster overlap, thus leading to unreliable clustering results. They can be particularly detrimental in computational biology, e.g., when determining the number of clusters in gene expression data related to cancer or when inferring phylogenetic trees and networks. While the issue of feature weighting has been studied in detail, no clustering methods using object weighting have been proposed yet. Here we describe a new general data partitioning method that includes an object-weighting step to assign higher weights to outliers and objects that cause cluster overlap. Different object weighting schemes, based on the Silhouette cluster validity index, the median and two intercluster distances, are defined. We compare our novel technique to a number of popular and efficient clustering algorithms, such as K-means, X-means, DAPC and Prediction Strength. In the presence of outliers and cluster overlap, our method largely outperforms X-means, DAPC and Prediction Strength as well as the K-means algorithm based on feature weighting.
Collapse
|
5
|
Kanza S, Bird CL, Niranjan M, McNeill W, Frey JG. The AI for Scientific Discovery Network . PATTERNS (NEW YORK, N.Y.) 2021; 2:100162. [PMID: 33511363 PMCID: PMC7815949 DOI: 10.1016/j.patter.2020.100162] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The Artificial Intelligence and Augmented Intelligence for Automated Investigation for Scientific Discovery Network+ (AI3SD) was established in response to the UK Engineering and Physical Sciences Research Council (EPSRC) late-2017 call for a Network+ to promote cutting-edge research in artificial intelligence to accelerate groundbreaking scientific discoveries. This article provides the philosophical, scientific, and technical underpinnings of the Network+, the history of the different domains represented in the Network+, and the specific focus of the Network+. The activities, collaborations, and research covered in the first year of the Network+ have highlighted the significant challenges in the chemistry and augmented and artificial intelligence space. These challenges are shaping the future directions of the Network+. The article concludes with a summary of the lessons learned in running this Network+ and introduces our plans for the future in a landscape redrawn by COVID-19, including rebranding into the AI 4 Scientific Discovery Network (www.ai4science.network).
Collapse
Affiliation(s)
- Samantha Kanza
- School of Chemistry, University of Southampton, Southampton SO17 1BJ, UK
| | - Colin Leonard Bird
- School of Chemistry, University of Southampton, Southampton SO17 1BJ, UK
| | - Mahesan Niranjan
- School of Electronics and Computer Science and University of Southampton, Southampton SO17 1BJ, UK
| | - William McNeill
- School of Humanities, University of Southampton, Southampton SO17 1BJ, UK
| | - Jeremy Graham Frey
- School of Chemistry, University of Southampton, Southampton SO17 1BJ, UK
| |
Collapse
|
6
|
Shetta O, Niranjan M. Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality. ROYAL SOCIETY OPEN SCIENCE 2020; 7:190714. [PMID: 32257299 PMCID: PMC7062061 DOI: 10.1098/rsos.190714] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 12/12/2019] [Indexed: 06/11/2023]
Abstract
The application of machine learning to inference problems in biology is dominated by supervised learning problems of regression and classification, and unsupervised learning problems of clustering and variants of low-dimensional projections for visualization. A class of problems that have not gained much attention is detecting outliers in datasets, arising from reasons such as gross experimental, reporting or labelling errors. These could also be small parts of a dataset that are functionally distinct from the majority of a population. Outlier data are often identified by considering the probability density of normal data and comparing data likelihoods against some threshold. This classical approach suffers from the curse of dimensionality, which is a serious problem with omics data which are often found in very high dimensions. We develop an outlier detection method based on structured low-rank approximation methods. The objective function includes a regularizer based on neighbourhood information captured in the graph Laplacian. Results on publicly available genomic data show that our method robustly detects outliers whereas a density-based method fails even at moderate dimensions. Moreover, we show that our method has better clustering and visualization performance on the recovered low-dimensional projection when compared with popular dimensionality reduction techniques.
Collapse
Affiliation(s)
- Omar Shetta
- Author for correspondence: Omar Shetta e-mail:
| | | |
Collapse
|
7
|
Parkes GM, Niranjan M. Uncovering extensive post-translation regulation during human cell cycle progression by integrative multi-'omics analysis. BMC Bioinformatics 2019; 20:536. [PMID: 31664894 PMCID: PMC6820968 DOI: 10.1186/s12859-019-3150-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 10/04/2019] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Analysis of high-throughput multi-'omics interactions across the hierarchy of expression has wide interest in making inferences with regard to biological function and biomarker discovery. Expression levels across different scales are determined by robust synthesis, regulation and degradation processes, and hence transcript (mRNA) measurements made by microarray/RNA-Seq only show modest correlation with corresponding protein levels. RESULTS In this work we are interested in quantitative modelling of correlation across such gene products. Building on recent work, we develop computational models spanning transcript, translation and protein levels at different stages of the H. sapiens cell cycle. We enhance this analysis by incorporating 25+ sequence-derived features which are likely determinants of cellular protein concentration and quantitatively select for relevant features, producing a vast dataset with thousands of genes. We reveal insights into the complex interplay between expression levels across time, using machine learning methods to highlight outliers with respect to such models as proteins associated with post-translationally regulated modes of action. CONCLUSIONS We uncover quantitative separation between modified and degraded proteins that have roles in cell cycle regulation, chromatin remodelling and protein catabolism according to Gene Ontology; and highlight the opportunities for providing biological insights in future model systems.
Collapse
Affiliation(s)
- Gregory M Parkes
- University of Southampton, University Road, Southampton, SO17 1BJ, UK.
| | - Mahesan Niranjan
- University of Southampton, University Road, Southampton, SO17 1BJ, UK
| |
Collapse
|
8
|
Moritz CP, Mühlhaus T, Tenzer S, Schulenborg T, Friauf E. Poor transcript-protein correlation in the brain: negatively correlating gene products reveal neuronal polarity as a potential cause. J Neurochem 2019; 149:582-604. [PMID: 30664243 DOI: 10.1111/jnc.14664] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 12/15/2018] [Accepted: 01/02/2019] [Indexed: 01/02/2023]
Abstract
Transcription, translation, and turnover of transcripts and proteins are essential for cellular function. The contribution of those factors to protein levels is under debate, as transcript levels and cognate protein levels do not necessarily correlate due to regulation of translation and protein turnover. Here we propose neuronal polarity as a third factor that is particularly evident in the CNS, leading to considerable distances between somata and axon terminals. Consequently, transcript levels may negatively correlate with cognate protein levels in CNS regions, i.e., transcript and protein levels behave reciprocally. To test this hypothesis, we performed an integrative inter-omics study and analyzed three interconnected rat auditory brainstem regions (cochlear nuclear complex, CN; superior olivary complex, SOC; inferior colliculus, IC) and the rest of the brain as a reference. We obtained transcript and protein sets in these regions of interest (ROIs) by DNA microarrays and label-free mass spectrometry, and performed principal component and correlation analyses. We found 508 transcript|protein pairs and detected poor to moderate transcript|protein correlation in all ROIs, as evidenced by coefficients of determination from 0.34 to 0.54. We identified 57-80 negatively correlating gene products in the ROIs and intensively analyzed four of them for which the correlation was poorest. Three cognate proteins (Slc6a11, Syngr1, Tppp) were synaptic and hence candidates for a negative correlation because of protein transport into axon terminals. Thus, we systematically analyzed the negatively correlating gene products. Gene ontology analyses revealed overrepresented transport/synapse-related proteins, supporting our hypothesis. We present 30 synapse/transport-related proteins with poor transcript|protein correlation. In conclusion, our analyses support that protein transport in polar cells is a third factor that influences the protein level and, thereby, the transcript|protein correlation. OPEN SCIENCE BADGES: This article has received a badge for *Open Materials* and *Open Data* because it provided all relevant information to reproduce the study in the manuscript and because it made the data publicly available. The data can be accessed at https://osf.io/ha28n/. The complete Open Science Disclosure form for this article can be found at the end of the article. More information about the Open Practices badges can be found at https://cos.io/our-services/open-science-badges/.
Collapse
Affiliation(s)
- Christian P Moritz
- Animal Physiology Group, Department of Biology, University of Kaiserslautern, Kaiserslautern, Germany.,Synaptopathies and Autoantibodies, Institut NeuroMyoGène INSERM U1217/ CNRS, UMR 5310, Faculty of Medicine, University Jean Monnet, Saint-Étienne, France
| | - Timo Mühlhaus
- Computational Systems Biology, Department of Biology, University of Kaiserslautern, Kaiserslautern, Germany
| | - Stefan Tenzer
- Institute of Immunology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Thomas Schulenborg
- Animal Physiology Group, Department of Biology, University of Kaiserslautern, Kaiserslautern, Germany.,Division of Allergology, Paul-Ehrlich-Institut, Langen, Germany
| | - Eckhard Friauf
- Animal Physiology Group, Department of Biology, University of Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
9
|
Martinez-Nunez RT, Rupani H, Platé M, Niranjan M, Chambers RC, Howarth PH, Sanchez-Elsner T. Genome-Wide Posttranscriptional Dysregulation by MicroRNAs in Human Asthma as Revealed by Frac-seq. THE JOURNAL OF IMMUNOLOGY 2018; 201:251-263. [PMID: 29769273 DOI: 10.4049/jimmunol.1701798] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 04/17/2018] [Indexed: 12/07/2022]
Abstract
MicroRNAs are small noncoding RNAs that inhibit gene expression posttranscriptionally, implicated in virtually all biological processes. Although the effect of individual microRNAs is generally studied, the genome-wide role of multiple microRNAs is less investigated. We assessed paired genome-wide expression of microRNAs with total (cytoplasmic) and translational (polyribosome-bound) mRNA levels employing subcellular fractionation and RNA sequencing (Frac-seq) in human primary bronchoepithelium from healthy controls and severe asthmatics. Severe asthma is a chronic inflammatory disease of the airways characterized by poor response to therapy. We found genes (i.e., isoforms of a gene) and mRNA isoforms differentially expressed in asthma, with novel inflammatory and structural pathophysiological mechanisms related to bronchoepithelium disclosed solely by polyribosome-bound mRNAs (e.g., IL1A and LTB genes or ITGA6 and ITGA2 alternatively spliced isoforms). Gene expression (i.e., isoforms of a gene) and mRNA expression analysis revealed different molecular candidates and biological pathways, with differentially expressed polyribosome-bound and total mRNAs also showing little overlap. We reveal a hub of six dysregulated microRNAs accounting for ∼90% of all microRNA targeting, displaying preference for polyribosome-bound mRNAs. Transfection of this hub in bronchial epithelial cells from healthy donors mimicked asthma characteristics. Our work demonstrates extensive posttranscriptional gene dysregulation in human asthma, in which microRNAs play a central role, illustrating the feasibility and importance of assessing posttranscriptional gene expression when investigating human disease.
Collapse
Affiliation(s)
- Rocio T Martinez-Nunez
- School of Immunology and Microbial Sciences, Medical Research Council and Asthma UK Centre in Allergic Mechanisms of Asthma, King's College London, London SE19RT, United Kingdom; .,Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton SO16 6YD, United Kingdom
| | - Hitasha Rupani
- Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton SO16 6YD, United Kingdom.,Southampton National Institute for Health Research Respiratory Biomedical Research Unit, Southampton Centre for Biomedical Research, University Hospital Southampton National Health Service Foundation Trust, Southampton SO16 6YD, United Kingdom
| | - Manuela Platé
- Centre for Inflammation and Tissue Repair, Department of Respiratory Medicine, Rayne Institute, University College London, London WC1E 6JF, United Kingdom; and
| | - Mahesan Niranjan
- School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, United Kingdom
| | - Rachel C Chambers
- Centre for Inflammation and Tissue Repair, Department of Respiratory Medicine, Rayne Institute, University College London, London WC1E 6JF, United Kingdom; and
| | - Peter H Howarth
- Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton SO16 6YD, United Kingdom.,Southampton National Institute for Health Research Respiratory Biomedical Research Unit, Southampton Centre for Biomedical Research, University Hospital Southampton National Health Service Foundation Trust, Southampton SO16 6YD, United Kingdom
| | - Tilman Sanchez-Elsner
- Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton SO16 6YD, United Kingdom
| |
Collapse
|
10
|
Dünder E, Gümüştekin S, Murat N, Cengiz MA. Subset selection in quantile regression analysis via alternative Bayesian information criteria and heuristic optimization. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2016.1257718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Emre Dünder
- Ondokuz Mayıs University, Faculty of Science, Department of Statistics, Samsun, Turkey
| | - Serpil Gümüştekin
- Ondokuz Mayıs University, Faculty of Science, Department of Statistics, Samsun, Turkey
| | - Naci Murat
- Ondokuz Mayıs University, Faculty of Engineering, Department of Endustrial Engineering, Samsun, Turkey
| | - Mehmet Ali Cengiz
- Ondokuz Mayıs University, Faculty of Science, Department of Statistics, Samsun, Turkey
| |
Collapse
|