1
|
Yoon J, Liu Z, Alaba M, Bruggeman LA, Janmey PA, Arana CA, Ayenuyo O, Medeiros I, Eddy S, Kretzler M, Henderson JM, Nair V, Naik AS, Chang AN, Miller RT. Glomerular Elasticity and Gene Expression Patterns Define Two Phases of Alport Nephropathy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582201. [PMID: 38948788 PMCID: PMC11212921 DOI: 10.1101/2024.02.26.582201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Objectives To understand the early stages if Alport nephropathy, we characterize the structural, functional, and biophysical properties of glomerular capillaries and podocytes in Col4α3 -/- mice, analyze kidney cortex transcriptional profiles at three time points, and investigate the effects of the ER stress mitigation by TUDCA on these parameters. We use human FSGS associated genes to identify molecular pathways rescued by TUDCA. Findings We define a disease progression timeline in Col4α3 -/- mice. Podocyte injury is evident by 3 months, with glomeruli reaching maximum deformability at 4 months, associated with 40% podocytes loss, followed by progressive capillary stiffening, increasing proteinuria, reduced renal function, inflammatory infiltrates, and fibrosis from months 4 to 7. RNA sequencing at 2, 4, and 7 months reveals increased cytokine and chemokine signaling, matrix and cell injury, and activation of the TNF pathway genes by 7 months, similar to NEPTUNE FSGS cohorts. These features are suppressed by TUDCA. Conclusions We define two phases of Col4α3 -/- nephropathy. The first is characterized by podocytopathy, increased glomerular capillary deformability and accelerated podocyte loss, and the second by increased capillary wall stiffening and renal inflammatory and profibrotic pathway activation. Disease suppression by TUDCA treatment identifies potential therapeutic targets for treating Alport and related nephropathies.
Collapse
|
2
|
Sastry AV, Yuan Y, Poudel S, Rychel K, Yoo R, Lamoureux CR, Li G, Burrows JT, Chauhan S, Haiman ZB, Al Bulushi T, Seif Y, Palsson BO, Zielinski DC. iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia. PLoS Comput Biol 2024; 20:e1012546. [PMID: 39441835 PMCID: PMC11534266 DOI: 10.1371/journal.pcbi.1012546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 11/04/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024] Open
Abstract
Public gene expression databases are a rapidly expanding resource of organism responses to diverse perturbations, presenting both an opportunity and a challenge for bioinformatics workflows to extract actionable knowledge of transcription regulatory network function. Here, we introduce a five-step computational pipeline, called iModulonMiner, to compile, process, curate, analyze, and characterize the totality of RNA-seq data for a given organism or cell type. This workflow is centered around the data-driven computation of co-regulated gene sets using Independent Component Analysis, called iModulons, which have been shown to have broad applications. As a demonstration, we applied this workflow to generate the iModulon structure of Bacillus subtilis using all high-quality, publicly-available RNA-seq data. Using this structure, we predicted regulatory interactions for multiple transcription factors, identified groups of co-expressed genes that are putatively regulated by undiscovered transcription factors, and predicted properties of a recently discovered single-subunit phage RNA polymerase. We also present a Python package, PyModulon, with functions to characterize, visualize, and explore computed iModulons. The pipeline, available at https://github.com/SBRG/iModulonMiner, can be readily applied to diverse organisms to gain a rapid understanding of their transcriptional regulatory network structure and condition-specific activity.
Collapse
Affiliation(s)
- Anand V. Sastry
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Yuan Yuan
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Saugat Poudel
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Kevin Rychel
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Reo Yoo
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Cameron R. Lamoureux
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Gaoyuan Li
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Joshua T. Burrows
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Siddharth Chauhan
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Zachary B. Haiman
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Tahani Al Bulushi
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Yara Seif
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California, United States of America
- Department of Pediatrics, University of California, San Diego, La Jolla, California, United States of America
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Kongens, Lyngby, Denmark
| | - Daniel C. Zielinski
- Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America
| |
Collapse
|
3
|
Fouché A, Chadoutaud L, Delattre O, Zinovyev A. Transmorph: a unifying computational framework for modular single-cell RNA-seq data integration. NAR Genom Bioinform 2023; 5:lqad069. [PMID: 37448589 PMCID: PMC10336778 DOI: 10.1093/nargab/lqad069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 06/02/2023] [Accepted: 07/10/2023] [Indexed: 07/15/2023] Open
Abstract
Data integration of single-cell RNA-seq (scRNA-seq) data describes the task of embedding datasets gathered from different sources or experiments into a common representation so that cells with similar types or states are embedded close to one another independently from their dataset of origin. Data integration is a crucial step in most scRNA-seq data analysis pipelines involving multiple batches. It improves data visualization, batch effect reduction, clustering, label transfer, and cell type inference. Many data integration tools have been proposed during the last decade, but a surge in the number of these methods has made it difficult to pick one for a given use case. Furthermore, these tools are provided as rigid pieces of software, making it hard to adapt them to various specific scenarios. In order to address both of these issues at once, we introduce the transmorph framework. It allows the user to engineer powerful data integration pipelines and is supported by a rich software ecosystem. We demonstrate transmorph usefulness by solving a variety of practical challenges on scRNA-seq datasets including joint datasets embedding, gene space integration, and transfer of cycle phase annotations. transmorph is provided as an open source python package.
Collapse
Affiliation(s)
- Aziz Fouché
- To whom correspondence should be addressed. Tel: +33 156246989;
| | - Loïc Chadoutaud
- Institut Curie, PSL Research University, 75005 Paris, France
- INSERM, 75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 75005 Paris, France
| | - Olivier Delattre
- INSERM U830, Equipe Labellisée LNCC, SIREDO Oncology Centre, Institut Curie, 75005 Paris, France
| | - Andrei Zinovyev
- Correspondence may also be addressed to Andrei Zinovyev. Tel: +33 156246989;
| |
Collapse
|
4
|
Pati SK, Gupta MK, Banerjee A, Mallik S, Zhao Z. PPIGCF: A Protein-Protein Interaction-Based Gene Correlation Filter for Optimal Gene Selection. Genes (Basel) 2023; 14:genes14051063. [PMID: 37239423 DOI: 10.3390/genes14051063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/26/2023] [Accepted: 05/04/2023] [Indexed: 05/28/2023] Open
Abstract
Biological data at the omics level are highly complex, requiring powerful computational approaches to identifying significant intrinsic characteristics to further search for informative markers involved in the studied phenotype. In this paper, we propose a novel dimension reduction technique, protein-protein interaction-based gene correlation filtration (PPIGCF), which builds on gene ontology (GO) and protein-protein interaction (PPI) structures to analyze microarray gene expression data. PPIGCF first extracts the gene symbols with their expression from the experimental dataset, and then, classifies them based on GO biological process (BP) and cellular component (CC) annotations. Every classification group inherits all the information on its CCs, corresponding to the BPs, to establish a PPI network. Then, the gene correlation filter (regarding gene rank and the proposed correlation coefficient) is computed on every network and eradicates a few weakly correlated genes connected with their corresponding networks. PPIGCF finds the information content (IC) of the other genes related to the PPI network and takes only the genes with the highest IC values. The satisfactory results of PPIGCF are used to prioritize significant genes. We performed a comparison with current methods to demonstrate our technique's efficiency. From the experiment, it can be concluded that PPIGCF needs fewer genes to reach reasonable accuracy (~99%) for cancer classification. This paper reduces the computational complexity and enhances the time complexity of biomarker discovery from datasets.
Collapse
Affiliation(s)
- Soumen Kumar Pati
- Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata 741249, West Bengal, India
| | - Manan Kumar Gupta
- Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata 741249, West Bengal, India
| | - Ayan Banerjee
- Department of Computer Science and Engineering, Jalpaiguri Govt. Engineering College, Jalpaiguri 735102, West Bengal, India
| | - Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA 02115, USA
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ 85721, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
5
|
Du Y, Kong Y, He X. IABC: A Toolbox for Intelligent Analysis of Brain Connectivity. Neuroinformatics 2023; 21:303-321. [PMID: 36609668 DOI: 10.1007/s12021-022-09617-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2022] [Indexed: 01/09/2023]
Abstract
Brain functional networks and connectivity have played an important role in exploring brain function for understanding the brain and disclosing the mechanisms of brain disorders. Independent component analysis (ICA) is one of the most widely applied data-driven methods to extract brain functional networks/connectivity. However, it is hard to guarantee the reliability of networks/connectivity due to the randomness of component order and the difficulty in selecting an optimal component number in ICA. To facilitate the analysis of brain functional networks and connectivity using ICA, we developed a MATLAB toolbox called Intelligent Analysis of Brain Connectivity (IABC). IABC incorporates our previously proposed group information guided independent component analysis (GIG-ICA), NeuroMark, and splitting-merging assisted reliable ICA (SMART ICA) methods, which can estimate reliable individual-subject neuroimaging measures for further analysis. After user inputs functional magnetic resonance imaging (fMRI) data of multiple subjects that are regularly organized (e.g., in Brain Imaging Data Structure (BIDS)) and clicks a few buttons to set parameters, IABC automatically outputs brain functional networks, their related time courses, and functional network connectivity of each subject. All these neuroimaging measures are promising for providing clues in understanding brain function and differentiating brain disorders.
Collapse
Affiliation(s)
- Yuhui Du
- School of Computer and Information Technology, Shanxi University, Taiyuan, China.
| | - Yanshu Kong
- School of Computer and Information Technology, Shanxi University, Taiyuan, China
| | - Xingyu He
- School of Computer and Information Technology, Shanxi University, Taiyuan, China
| |
Collapse
|
6
|
Waschina S, Seeger K. Using in-tube extraction and slice selective NMR experiments allow imaging via statistical analysis of metabolic profiles. Anal Chim Acta 2022; 1231:340419. [DOI: 10.1016/j.aca.2022.340419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 08/19/2022] [Accepted: 09/17/2022] [Indexed: 11/25/2022]
|
7
|
Captier N, Merlevede J, Molkenov A, Seisenova A, Zhubanchaliyev A, Nazarov PV, Barillot E, Kairov U, Zinovyev A. BIODICA: a computational environment for Independent Component Analysis of omics data. Bioinformatics 2022; 38:2963-2964. [PMID: 35561190 DOI: 10.1093/bioinformatics/btac204] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/29/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
SUMMARY We developed BIODICA, an integrated computational environment for application of independent component analysis (ICA) to bulk and single-cell molecular profiles, interpretation of the results in terms of biological functions and correlation with metadata. The computational core is the novel Python package stabilized-ica which provides interface to several ICA algorithms, a stabilization procedure, meta-analysis and component interpretation tools. BIODICA is equipped with a user-friendly graphical user interface, allowing non-experienced users to perform the ICA-based omics data analysis. The results are provided in interactive ways, thus facilitating communication with biology experts. AVAILABILITY AND IMPLEMENTATION BIODICA is implemented in Java, Python and JavaScript. The source code is freely available on GitHub under the MIT and the GNU LGPL licenses. BIODICA is supported on all major operating systems. URL: https://sysbio-curie.github.io/biodica-environment/.
Collapse
Affiliation(s)
- Nicolas Captier
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, F-75005 Paris, France
- Institut Curie, PSL Research University, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
- Laboratoire d'Imagerie Translationnelle en Oncologie, Institut Curie, INSERM U1288, PSL Research University, 91400 Orsay, France
| | - Jane Merlevede
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, F-75005 Paris, France
- Institut Curie, PSL Research University, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| | - Askhat Molkenov
- National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan 010000, Kazakhstan
| | - Ainur Seisenova
- National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan 010000, Kazakhstan
| | - Altynbek Zhubanchaliyev
- National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan 010000, Kazakhstan
| | - Petr V Nazarov
- Multiomics Data Science Research Group, Department of Cancer Research & Bioinformatics Platform, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg
| | - Emmanuel Barillot
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, F-75005 Paris, France
- Institut Curie, PSL Research University, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| | - Ulykbek Kairov
- National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan 010000, Kazakhstan
| | - Andrei Zinovyev
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, F-75005 Paris, France
- Institut Curie, PSL Research University, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| |
Collapse
|
8
|
Pan J, Kwon JJ, Talamas JA, Borah AA, Vazquez F, Boehm JS, Tsherniak A, Zitnik M, McFarland JM, Hahn WC. Sparse dictionary learning recovers pleiotropy from human cell fitness screens. Cell Syst 2022; 13:286-303.e10. [PMID: 35085500 PMCID: PMC9035054 DOI: 10.1016/j.cels.2021.12.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 10/30/2021] [Accepted: 12/21/2021] [Indexed: 12/28/2022]
Abstract
In high-throughput functional genomic screens, each gene product is commonly assumed to exhibit a singular biological function within a defined protein complex or pathway. In practice, a single gene perturbation may induce multiple cascading functional outcomes, a genetic principle known as pleiotropy. Here, we model pleiotropy in fitness screen collections by representing each gene perturbation as the sum of multiple perturbations of biological functions, each harboring independent fitness effects inferred empirically from the data. Our approach (Webster) recovered pleiotropic functions for DNA damage proteins from genotoxic fitness screens, untangled distinct signaling pathways upstream of shared effector proteins from cancer cell fitness screens, and predicted the stoichiometry of an unknown protein complex subunit from fitness data alone. Modeling compound sensitivity profiles in terms of genetic functions recovered compound mechanisms of action. Our approach establishes a sparse approximation mechanism for unraveling complex genetic architectures underlying high-dimensional gene perturbation readouts.
Collapse
Affiliation(s)
- Joshua Pan
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Jason J Kwon
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Jessica A Talamas
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Ashir A Borah
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Jesse S Boehm
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aviad Tsherniak
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02215, USA; Harvard University, Data Science Initiative, Cambridge, MA 02138, USA
| | | | - William C Hahn
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA; Brigham and Women's Hospital and Harvard Medical School, Department of Medicine, Boston, MA 02215, USA.
| |
Collapse
|
9
|
Chou Y, Chang C, Remedios SW, Butman JA, Chan L, Pham DL. Automated Classification of Resting-State fMRI ICA Components Using a Deep Siamese Network. Front Neurosci 2022; 16:768634. [PMID: 35368292 PMCID: PMC8971556 DOI: 10.3389/fnins.2022.768634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 02/09/2022] [Indexed: 11/24/2022] Open
Abstract
Manual classification of functional resting state networks (RSNs) derived from Independent Component Analysis (ICA) decomposition can be labor intensive and requires expertise, particularly in large multi-subject analyses. Hence, a fully automatic algorithm that can reliably classify these RSNs is desirable. In this paper, we present a deep learning approach based on a Siamese Network to learn a discriminative feature representation for single-subject ICA component classification. Advantages of this supervised framework are that it requires relatively few training data examples and it does not require the number of ICA components to be specified. In addition, our approach permits one-shot learning, which allows generalization to new classes not seen in the training set with only one example of each new class. The proposed method is shown to out-perform traditional convolutional neural network (CNN) and template matching methods in identifying eleven subject-specific RSNs, achieving 100% accuracy on a holdout data set and over 99% accuracy on an outside data set. We also demonstrate that the method is robust to scan-rescan variation. Finally, we show that the functional connectivity of default mode and salience networks identified by the proposed technique is altered in a group analysis of mild traumatic brain injury (TBI), severe TBI, and healthy subjects.
Collapse
Affiliation(s)
- Yiyu Chou
- Center for Neuroscience and Regenerative Medicine, Bethesda, MD, United States
- *Correspondence: Yiyu Chou,
| | - Catie Chang
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Samuel W. Remedios
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, United States
| | - John A. Butman
- Center for Neuroscience and Regenerative Medicine, Bethesda, MD, United States
- Radiology and Imaging Sciences, National Institutes of Health, Bethesda, MD, United States
| | - Leighton Chan
- Center for Neuroscience and Regenerative Medicine, Bethesda, MD, United States
- Rehabilitation Medicine Department at Clinical Center, National Institutes of Health, Bethesda, MD, United States
| | - Dzung L. Pham
- Center for Neuroscience and Regenerative Medicine, Bethesda, MD, United States
| |
Collapse
|
10
|
Banerjee P, Chattopadhyay T, Chattopadhyay AK. Investigation of the effect of bars on the properties of spiral galaxies: a multivariate statistical study. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2039198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
11
|
Amblard E, Bac J, Chervov A, Soumelis V, Zinovyev A. Hubness reduction improves clustering and trajectory inference in single-cell transcriptomic data. Bioinformatics 2022; 38:1045-1051. [PMID: 34871374 DOI: 10.1093/bioinformatics/btab795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/05/2021] [Accepted: 11/17/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-cell RNA-seq (scRNAseq) datasets are characterized by large ambient dimensionality, and their analyses can be affected by various manifestations of the dimensionality curse. One of these manifestations is the hubness phenomenon, i.e. existence of data points with surprisingly large incoming connectivity degree in the datapoint neighbourhood graph. Conventional approach to dampen the unwanted effects of high dimension consists in applying drastic dimensionality reduction. It remains unexplored if this step can be avoided thus retaining more information than contained in the low-dimensional projections, by correcting directly hubness. RESULTS We investigated hubness in scRNAseq data. We show that hub cells do not represent any visible technical or biological bias. The effect of various hubness reduction methods is investigated with respect to the clustering, trajectory inference and visualization tasks in scRNAseq datasets. We show that hubness reduction generates neighbourhood graphs with properties more suitable for applying machine learning methods; and that it outperforms other state-of-the-art methods for improving neighbourhood graphs. As a consequence, clustering, trajectory inference and visualization perform better, especially for datasets characterized by large intrinsic dimensionality. Hubness is an important phenomenon characterizing data point neighbourhood graphs computed for various types of sequencing datasets. Reducing hubness can be beneficial for the analysis of scRNAseq data with large intrinsic dimensionality in which case it can be an alternative to drastic dimensionality reduction. AVAILABILITY AND IMPLEMENTATION The code used to analyze the datasets and produce the figures of this article is available from https://github.com/sysbio-curie/schubness. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elise Amblard
- Université de Paris, INSERM, HIPI, F-75010 Paris, France
| | - Jonathan Bac
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
| | - Alexander Chervov
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
| | | | - Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.,Laboratory of Advanced Methods for High-Dimensional Data Analysis, Lobachevsky University, 603000 Nizhny Novgorod, Russia
| |
Collapse
|
12
|
Marchais A, Marques Da Costa ME, Job B, Abbas R, Drubay D, Piperno-Neumann S, Fromigué O, Gomez-Brouchet A, Françoise R, Droit R, Lervat C, ENTZ-WERLE N, Pacquement H, Devoldere C, Cupissol D, Bodet D, GANDEMER V, Berger MG, Bérard PM, Jimenez M, Vassal G, Geoerger B, Brugieres L, Gaspar N. Immune infiltrate and tumor microenvironment transcriptional programs stratify pediatric osteosarcoma into prognostic groups at diagnosis. Cancer Res 2022; 82:974-985. [DOI: 10.1158/0008-5472.can-20-4189] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 07/26/2021] [Accepted: 01/18/2022] [Indexed: 11/16/2022]
|
13
|
Modabbernia A, Michelini G, Reichenberg A, Kotov R, Barch D, Frangou S. Neural Signatures of Data-Driven Psychopathology Dimensions at the Transition to Adolescence. Eur Psychiatry 2022; 65:e12. [PMID: 35067249 PMCID: PMC8853849 DOI: 10.1192/j.eurpsy.2021.2262] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background One of the challenges in human neuroscience is to uncover associations between brain organization and psychopathology in order to better understand the biological underpinnings of mental disorders. Here, we aimed to characterize the neural correlates of psychopathology dimensions obtained using two conceptually different data-driven approaches. Methods Dimensions of psychopathology that were either maximally dissociable or correlated were respectively extracted by independent component analysis (ICA) and exploratory factor analysis (EFA) applied to the Childhood Behavior Checklist items from 9- to 10-year-olds (n = 9983; 47.8% female, 50.8% white) participating in the Adolescent Brain Cognitive Development study. The patterns of brain morphometry, white matter integrity and resting-state connectivity associated with each dimension were identified using kernel-based regularized least squares and compared between dimensions using Spearman’s correlation coefficient. Results ICA identified three psychopathology dimensions, representing opposition–disinhibition, cognitive dyscontrol, and negative affect, with distinct brain correlates. Opposition–disinhibition was negatively associated with cortical surface area, cognitive dyscontrol was negatively associated with anatomical and functional dysconnectivity while negative affect did not show discernable associations with any neuroimaging measure. EFA identified three dimensions representing broad externalizing, neurodevelopmental, and broad Internalizing problems with partially overlapping brain correlates. All EFA-derived dimensions were negatively associated with cortical surface area, whereas measures of functional and structural connectivity were associated only with the neurodevelopmental dimension. Conclusions This study highlights the importance of cortical surface area and global connectivity for psychopathology in preadolescents and provides evidence for dissociable psychopathology dimensions with distinct brain correlates.
Collapse
|
14
|
McConn JL, Lamoureux CR, Poudel S, Palsson BO, Sastry AV. Optimal dimensionality selection for independent component analysis of transcriptomic data. BMC Bioinformatics 2021; 22:584. [PMID: 34879815 PMCID: PMC8653613 DOI: 10.1186/s12859-021-04497-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 11/16/2021] [Indexed: 11/23/2022] Open
Abstract
Background Independent component analysis is an unsupervised machine learning algorithm that separates a set of mixed signals into a set of statistically independent source signals. Applied to high-quality gene expression datasets, independent component analysis effectively reveals both the source signals of the transcriptome as co-regulated gene sets, and the activity levels of the underlying regulators across diverse experimental conditions. Two major variables that affect the final gene sets are the diversity of the expression profiles contained in the underlying data, and the user-defined number of independent components, or dimensionality, to compute. Availability of high-quality transcriptomic datasets has grown exponentially as high-throughput technologies have advanced; however, optimal dimensionality selection remains an open question. Methods We computed independent components across a range of dimensionalities for four gene expression datasets with varying dimensions (both in terms of number of genes and number of samples). We computed the correlation between independent components across different dimensionalities to understand how the overall structure evolves as the number of user-defined components increases. We then measured how well the resulting gene clusters reflected known regulatory mechanisms, and developed a set of metrics to assess the accuracy of the decomposition at a given dimension. Results We found that over-decomposition results in many independent components dominated by a single gene, whereas under-decomposition results in independent components that poorly capture the known regulatory structure. From these results, we developed a new method, called OptICA, for finding the optimal dimensionality that controls for both over- and under-decomposition. Specifically, OptICA selects the highest dimension that produces a low number of components that are dominated by a single gene. We show that OptICA outperforms two previously proposed methods for selecting the number of independent components across four transcriptomic databases of varying sizes. Conclusions OptICA avoids both over-decomposition and under-decomposition of transcriptomic datasets resulting in the best representation of the organism’s underlying transcriptional regulatory network. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04497-7.
Collapse
|
15
|
Comparison of metabolic states using genome-scale metabolic models. PLoS Comput Biol 2021; 17:e1009522. [PMID: 34748535 PMCID: PMC8601616 DOI: 10.1371/journal.pcbi.1009522] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 11/18/2021] [Accepted: 10/04/2021] [Indexed: 11/25/2022] Open
Abstract
Genome-scale metabolic models (GEMs) are comprehensive knowledge bases of cellular metabolism and serve as mathematical tools for studying biological phenotypes and metabolic states or conditions in various organisms and cell types. Given the sheer size and complexity of human metabolism, selecting parameters for existing analysis methods such as metabolic objective functions and model constraints is not straightforward in human GEMs. In particular, comparing several conditions in large GEMs to identify condition- or disease-specific metabolic features is challenging. In this study, we showcase a scalable, model-driven approach for an in-depth investigation and comparison of metabolic states in large GEMs which enables identifying the underlying functional differences. Using a combination of flux space sampling and network analysis, our approach enables extraction and visualisation of metabolically distinct network modules. Importantly, it does not rely on known or assumed objective functions. We apply this novel approach to extract the biochemical differences in adipocytes arising due to unlimited vs blocked uptake of branched-chain amino acids (BCAAs, considered as biomarkers in obesity) using a human adipocyte GEM (iAdipocytes1809). The biological significance of our approach is corroborated by literature reports confirming our identified metabolic processes (TCA cycle and Fatty acid metabolism) to be functionally related to BCAA metabolism. Additionally, our analysis predicts a specific altered uptake and secretion profile indicating a compensation for the unavailability of BCAAs. Taken together, our approach facilitates determining functional differences between any metabolic conditions of interest by offering a versatile platform for analysing and comparing flux spaces of large metabolic networks. Cellular metabolism is a highly complex and interconnected system. As many lifestyle diseases in humans have a strong metabolic component, it is important to understand metabolic differences between healthy and diseased states. In systems biology, metabolic behaviours are investigated using genome-scale metabolic models. In addition to the sheer size and complexity of the genome-scale metabolic models of human systems, using existing analysis methods is challenging and the parameter selection is not straightforward. Therefore, novel methodological frameworks are necessary for analysing metabolic conditions despite the challenges posed by human models. Particularly, an ongoing challenge has been that of comparing several phenotypes for identifying condition- or disease-specific metabolic signatures. We address this significant challenge by developing a scalable and model-driven approach, ComMet (Comparison of Metabolic states). ComMet enables an in-depth investigation and comparison of metabolic phenotypes in large models while also identifying the underlying functional differences. Novel hypotheses can be generated using ComMet for not only understanding known metabolic phenotypes better but also for guiding the design of new experiments to validate the processes predicted by ComMet.
Collapse
|
16
|
Du Y, He X, Calhoun VD. SMART (splitting-merging assisted reliable) Independent Component Analysis for Brain Functional Networks. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:3263-3266. [PMID: 34891937 DOI: 10.1109/embc46164.2021.9630284] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Independent component analysis (ICA) has been widely applied to estimate brain functional networks from functional magnetic resonance imaging (fMRI) data. ICA is a data-driven approach, however, the number of components must be prespecified. Indeed, it is difficult to estimate or determine an optimal number of components in fMRI analysis. In this paper, we propose a SMART (splitting-merging assisted reliable) ICA to overcome the problem. Our method first estimates group-level components using different settings and then yields reliable components by using a splitting and merging clustering approach. Subject-specific components are obtained using our previously proposed group information guided ICA (GIG-ICA) based on reliable group-level components to estimate individual-subject independent components. Simulations with unique components for subjects showed our method extracted components with high similarity to the ground truth spatial maps (SMs). For real fMRI data, the functional networks extracted by our method showed both similarity and specificity across subjects. To sum up, our method can effectively and accurately identify subject-specific brain functional networks without a need of parameter setting.Clinical Relevance- SMART ICA automatically extracts reliable subject-specific brain functional networks that can be used for biomarker identification.
Collapse
|
17
|
Ashenova A, Daniyarov A, Molkenov A, Sharip A, Zinovyev A, Kairov U. Meta-Analysis of Esophageal Cancer Transcriptomes Using Independent Component Analysis. Front Genet 2021; 12:683632. [PMID: 34795689 PMCID: PMC8594933 DOI: 10.3389/fgene.2021.683632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 10/05/2021] [Indexed: 11/17/2022] Open
Abstract
Independent Component Analysis is a matrix factorization method for data dimension reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation of biological, environmental, and technical factors affecting gene expression. The study aimed to analyze the publicly available esophageal cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways and molecular signatures involved in this cancer type. In this study, four independent esophageal cancer transcriptomic datasets from GEO databases were used. A bioinformatics tool « BiODICA-Independent Component Analysis of Big Omics Data» was applied to compute independent components (ICs). Gene Set Enrichment Analysis (GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction and visualization of gene networks and graphs were performed using the Cytoscape, and HPRD database. The correlation graph between decompositions into 30 ICs was built with absolute correlation values exceeding 0.3. Clusters of components-pseudocliques were observed in the structure of the correlation graph. The top 1,000 most contributing genes of each ICs in the pseudocliques were mapped to the PPI network to construct associated signaling pathways. Some cliques were composed of densely interconnected nodes and included components common to most cancer types (such as cell cycle and extracellular matrix signals), while others were specific to EC. The results of this investigation may reveal potential biomarkers of esophageal carcinogenesis, functional subsystems dysregulated in the tumor cells, and be helpful in predicting the early development of a tumor.
Collapse
Affiliation(s)
- Ainur Ashenova
- Laboratory of Bioinformatics and Systems Biology, National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan, Kazakhstan
- Department of Biology, School of Sciences and Humanities, Nazarbayev University, Nur-Sultan, Kazakhstan
| | - Asset Daniyarov
- Laboratory of Bioinformatics and Systems Biology, National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan, Kazakhstan
| | - Askhat Molkenov
- Laboratory of Bioinformatics and Systems Biology, National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan, Kazakhstan
| | - Aigul Sharip
- Laboratory of Bioinformatics and Systems Biology, National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan, Kazakhstan
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, INSERM U900, Paris, France
- Laboratory of Advanced Methods for High-dimensional Data Analysis, Lobachevsky University, Nizhny Novgorod, Russia
| | - Ulykbek Kairov
- Laboratory of Bioinformatics and Systems Biology, National Laboratory Astana, Center for Life Sciences, Nazarbayev University, Nur-Sultan, Kazakhstan
| |
Collapse
|
18
|
Wang W, Tan H, Sun M, Han Y, Chen W, Qiu S, Zheng K, Wei G, Ni T. Independent component analysis based gene co-expression network inference (ICAnet) to decipher functional modules for better single-cell clustering and batch integration. Nucleic Acids Res 2021; 49:e54. [PMID: 33619563 PMCID: PMC8136772 DOI: 10.1093/nar/gkab089] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 01/26/2021] [Accepted: 02/02/2021] [Indexed: 12/18/2022] Open
Abstract
With the tremendous increase of publicly available single-cell RNA-sequencing (scRNA-seq) datasets, bioinformatics methods based on gene co-expression network are becoming efficient tools for analyzing scRNA-seq data, improving cell type prediction accuracy and in turn facilitating biological discovery. However, the current methods are mainly based on overall co-expression correlation and overlook co-expression that exists in only a subset of cells, thus fail to discover certain rare cell types and sensitive to batch effect. Here, we developed independent component analysis-based gene co-expression network inference (ICAnet) that decomposed scRNA-seq data into a series of independent gene expression components and inferred co-expression modules, which improved cell clustering and rare cell-type discovery. ICAnet showed efficient performance for cell clustering and batch integration using scRNA-seq datasets spanning multiple cells/tissues/donors/library types. It works stably on datasets produced by different library construction strategies and with different sequencing depths and cell numbers. We demonstrated the capability of ICAnet to discover rare cell types in multiple independent scRNA-seq datasets from different sources. Importantly, the identified modules activated in acute myeloid leukemia scRNA-seq datasets have the potential to serve as new diagnostic markers. Thus, ICAnet is a competitive tool for cell clustering and biological interpretations of single-cell RNA-seq data analysis.
Collapse
Affiliation(s)
- Weixu Wang
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai, 200438, P.R. China
| | - Huanhuan Tan
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166, P.R. China
| | - Mingwan Sun
- College of Life Science, South China Agricultural University, Guangzhou 510642, P.R. China
| | - Yiqing Han
- College of Agricultural, South China Agricultural University, Guangzhou 510642, P.R. China
| | - Wei Chen
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai, 200438, P.R. China
| | - Shengnu Qiu
- Division of Biosciences, Faculty of Life Sciences, University College London, London, WC1E 6BT, UK
| | - Ke Zheng
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166, P.R. China
| | - Gang Wei
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai, 200438, P.R. China.,MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200438, P.R. China
| | - Ting Ni
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai, 200438, P.R. China
| |
Collapse
|
19
|
Petrovska J, Loos E, Coynel D, Egli T, Papassotiropoulos A, de Quervain DJF, Milnik A. Recognition memory performance can be estimated based on brain activation networks. Behav Brain Res 2021; 408:113285. [PMID: 33819531 DOI: 10.1016/j.bbr.2021.113285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 03/11/2021] [Accepted: 03/30/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Recognition memory is an essential ability for functioning in everyday life. Establishing robust brain networks linked to recognition memory performance can help to understand the neural basis of recognition memory itself and the interindividual differences in recognition memory performance. METHODS We analysed behavioural and whole-brain fMRI data from 1'410 healthy young adults during the testing phase of a picture-recognition task. Using independent component analysis (ICA), we decomposed the fMRI contrast for previously seen vs. new (old-new) pictures into networks of brain activity. This was done in two independent samples (training sample: N = 645, replication sample: N = 665). Next, we investigated the relationship between the identified brain networks and interindividual differences in recognition memory performance by conducting a prediction analysis. We estimated the prediction accuracy in a third independent sample (test sample: N = 100). RESULTS We identified 12 robust and replicable brain networks using two independent samples. Based on the activity of those networks we could successfully estimate interindividual differences in recognition memory performance with high accuracy in a third independent sample (r = 0.5, p = 1.29 × 10-07). CONCLUSION Given the robustness of the ICA decomposition as well as the high prediction estimate, the identified brain networks may be considered as potential biomarkers of recognition memory performance in healthy young adults and can be further investigated in the context of health and disease.
Collapse
Affiliation(s)
- Jana Petrovska
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055 Basel, Switzerland; Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055 Basel, Switzerland.
| | - Eva Loos
- Division of Cognitive Neuroscience, Department of Psychology, University of Basel, CH-4055 Basel, Switzerland; Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055 Basel, Switzerland
| | - David Coynel
- Division of Cognitive Neuroscience, Department of Psychology, University of Basel, CH-4055 Basel, Switzerland; Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055 Basel, Switzerland
| | - Tobias Egli
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055 Basel, Switzerland; Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055 Basel, Switzerland
| | - Andreas Papassotiropoulos
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055 Basel, Switzerland; Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055 Basel, Switzerland; Psychiatric University Clinics, University of Basel, CH-4055 Basel, Switzerland; Department Biozentrum, Life Sciences Training Facility, University of Basel, CH-4056 Basel, Switzerland
| | - Dominique J-F de Quervain
- Division of Cognitive Neuroscience, Department of Psychology, University of Basel, CH-4055 Basel, Switzerland; Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055 Basel, Switzerland; Psychiatric University Clinics, University of Basel, CH-4055 Basel, Switzerland
| | - Annette Milnik
- Division of Molecular Neuroscience, Department of Psychology, University of Basel, CH-4055 Basel, Switzerland; Transfaculty Research Platform Molecular and Cognitive Neurosciences, University of Basel, CH-4055 Basel, Switzerland; Psychiatric University Clinics, University of Basel, CH-4055 Basel, Switzerland.
| |
Collapse
|
20
|
Sastry AV, Hu A, Heckmann D, Poudel S, Kavvas E, Palsson BO. Independent component analysis recovers consistent regulatory signals from disparate datasets. PLoS Comput Biol 2021; 17:e1008647. [PMID: 33529205 PMCID: PMC7888660 DOI: 10.1371/journal.pcbi.1008647] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 02/17/2021] [Accepted: 12/18/2020] [Indexed: 01/03/2023] Open
Abstract
The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3,000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets. Cells adapt to diverse environments by regulating gene expression. Genome-wide measurements of gene expression levels have exponentially increased in recent years, but successful integration and analysis of these datasets are limited. Recently, we showed that independent component analysis (ICA), a signal deconvolution algorithm, can separate a large bacterial gene expression dataset into groups of co-regulated genes. This previous study focused on data generated by a standardized pipeline and did not address whether ICA extracts the same quantitative co-expression signals across expression profiling platforms. In this study, we show that ICA finds similar co-regulation patterns underlying multiple gene expression datasets and can be used as a tool to integrate and interpret diverse datasets. Using a dataset containing over 3,000 expression profiles, we predicted three new regulons and characterized their activities. Since large, standardized expression datasets only exist for a few bacterial strains, these results broaden the possible applications of this tool to better understand transcriptional regulation across a wide range of microbes.
Collapse
Affiliation(s)
- Anand V. Sastry
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Alyssa Hu
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - David Heckmann
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Saugat Poudel
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Erol Kavvas
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
- * E-mail:
| |
Collapse
|
21
|
Sultan I, Fromion V, Schbath S, Nicolas P. Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes. J R Soc Interface 2020; 17:20200600. [PMID: 33023397 PMCID: PMC7653377 DOI: 10.1098/rsif.2020.0600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 09/10/2020] [Indexed: 11/12/2022] Open
Abstract
Automatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. The central idea of this model is to improve the probabilistic representation of the promoter DNA sequences by incorporating covariates summarizing expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). A dedicated trans-dimensional Markov chain Monte Carlo algorithm adjusts the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe exact position relative to the transcription start site, and chooses the expression covariates relevant for each motif. All parameters are estimated simultaneously, for many motifs and many expression covariates. The method is applied to a dataset of transcription start sites and expression profiles available for Listeria monocytogenes. The results validate the approach and provide a new global view of the transcription regulatory network of this important pathogen. Remarkably, a previously unreported motif is found in promoter regions of ribosomal protein genes, suggesting a role in the regulation of growth.
Collapse
Affiliation(s)
- Ibrahim Sultan
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | | | | | - Pierre Nicolas
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| |
Collapse
|
22
|
Cantini L, Kairov U, de Reyniès A, Barillot E, Radvanyi F, Zinovyev A. Assessing reproducibility of matrix factorization methods in independent transcriptomes. Bioinformatics 2020; 35:4307-4313. [PMID: 30938767 PMCID: PMC6821374 DOI: 10.1093/bioinformatics/btz225] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 03/20/2019] [Accepted: 04/01/2019] [Indexed: 12/26/2022] Open
Abstract
Motivation Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). MF algorithms have never been compared based on the between-datasets reproducibility of their outputs in similar independent datasets. Lack of this knowledge might have a crucial impact when generalizing the predictions made in a study to others. Results We systematically test widely used MF methods on several transcriptomic datasets collected from the same cancer type (14 colorectal, 8 breast and 4 ovarian cancer transcriptomic datasets). Inspired by concepts of evolutionary bioinformatics, we design a novel framework based on Reciprocally Best Hit (RBH) graphs in order to benchmark the MF methods for their ability to produce generalizable components. We show that a particular protocol of application of independent component analysis (ICA), accompanied by a stabilization procedure, leads to a significant increase in the between-datasets reproducibility. Moreover, we show that the signals detected through this method are systematically more interpretable than those of other standard methods. We developed a user-friendly tool for performing the Stabilized ICA-based RBH meta-analysis. We apply this methodology to the study of colorectal cancer (CRC) for which 14 independent transcriptomic datasets can be collected. The resulting RBH graph maps the landscape of interconnected factors associated to biological processes or to technological artifacts. These factors can be used as clinical biomarkers or robust and tumor-type specific transcriptomic signatures of tumoral cells or tumoral microenvironment. Their intensities in different samples shed light on the mechanistic basis of CRC molecular subtyping. Availability and implementation The RBH construction tool is available from http://goo.gl/DzpwYp Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura Cantini
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France.,Computational Systems Biology Team, Institut de Biologie de l'École Normale Supérieure, CNRS UMR8197, INSERM U1024, École Normale Supérieure, PSL Research University, Paris, France
| | - Ulykbek Kairov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
| | - Aurélien de Reyniès
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France
| | - François Radvanyi
- Institut Curie, PSL Research University, CNRS, UMR144, Equipe Labellisée Ligue Contre le Cancer, Paris, France.,Sorbonne Universités, UPMC Université Paris 06, CNRS, UMR144, Paris
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, F-75006 Paris, France.,Lobachevsky University, Nizhny Novgorod, Russia
| |
Collapse
|
23
|
Campos E, Hazlett C, Tan P, Truong H, Loo S, DiStefano C, Jeste S, Şentürk D. Principle ERP reduction and analysis: Estimating and using principle ERP waveforms underlying ERPs across tasks, subjects and electrodes. Neuroimage 2020; 212:116630. [PMID: 32087372 PMCID: PMC7594508 DOI: 10.1016/j.neuroimage.2020.116630] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 01/03/2020] [Accepted: 02/10/2020] [Indexed: 11/28/2022] Open
Abstract
Event-related potentials (ERP) waveforms are the summation of many overlapping signals. Changes in the peak or mean amplitude of a waveform over a given time period, therefore, cannot reliably be attributed to a particular ERP component of ex ante interest, as is the standard approach to ERP analysis. Though this problem is widely recognized, it is not well addressed in practice. Our approach begins by presuming that any observed ERP waveform - at any electrode, for any trial type, and for any participant - is approximately a weighted combination of signals from an underlying set of what we refer to as principle ERPs, or pERPs. We propose an accessible approach to analyzing complete ERP waveforms in terms of their underlying pERPs. First, we propose the principle ERP reduction (pERP-RED) algorithm for investigators to estimate a suitable set of pERPs from their data, which may span multiple tasks. Next, we provide tools and illustrations of pERP-space analysis, whereby observed ERPs are decomposed into the amplitudes of the contributing pERPs, which can be contrasted across conditions or groups to reveal which pERPs differ (substantively and/or significantly) between conditions/groups. Differences on all pERPs can be reported together rather than selectively, providing complete information on all components in the waveform, thereby avoiding selective reporting or user discretion regarding the choice of which components or windows to use. The scalp distribution of each pERP can also be plotted for any group/condition. We demonstrate this suite of tools through simulations and on real data collected from multiple experiments on participants diagnosed with Autism Spectrum Disorder and Attention Deficit Hyperactivity Disorder. Software for conducting these analyses is provided in the pERPred package for R.
Collapse
Affiliation(s)
- Emilie Campos
- Department of Biostatistics, University of California, Los Angeles, USA
| | - Chad Hazlett
- Departments of Statistics and Political Science, University of California, Los Angeles, USA
| | - Patricia Tan
- Department of Psychiatry, University of California, Los Angeles, USA
| | - Holly Truong
- Department of Psychiatry, University of California, Los Angeles, USA
| | - Sandra Loo
- Department of Psychiatry, University of California, Los Angeles, USA
| | | | - Shafali Jeste
- Department of Psychiatry, University of California, Los Angeles, USA
| | - Damla Şentürk
- Department of Biostatistics, University of California, Los Angeles, USA.
| |
Collapse
|
24
|
Way GP, Zietz M, Rubinetti V, Himmelstein DS, Greene CS. Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations. Genome Biol 2020; 21:109. [PMID: 32393369 PMCID: PMC7212571 DOI: 10.1186/s13059-020-02021-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 04/16/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Unsupervised compression algorithms applied to gene expression data extract latent or hidden signals representing technical and biological sources of variation. However, these algorithms require a user to select a biologically appropriate latent space dimensionality. In practice, most researchers fit a single algorithm and latent dimensionality. We sought to determine the extent by which selecting only one fit limits the biological features captured in the latent representations and, consequently, limits what can be discovered with subsequent analyses. RESULTS We compress gene expression data from three large datasets consisting of adult normal tissue, adult cancer tissue, and pediatric cancer tissue. We train many different models across a large range of latent space dimensionalities and observe various performance differences. We identify more curated pathway gene sets significantly associated with individual dimensions in denoising autoencoder and variational autoencoder models trained using an intermediate number of latent dimensionalities. Combining compressed features across algorithms and dimensionalities captures the most pathway-associated representations. When trained with different latent dimensionalities, models learn strongly associated and generalizable biological representations including sex, neuroblastoma MYCN amplification, and cell types. Stronger signals, such as tumor type, are best captured in models trained at lower dimensionalities, while more subtle signals such as pathway activity are best identified in models trained with more latent dimensionalities. CONCLUSIONS There is no single best latent dimensionality or compression algorithm for analyzing gene expression data. Instead, using features derived from different compression models across multiple latent space dimensionalities enhances biological representations.
Collapse
Affiliation(s)
- Gregory P Way
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA.
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, 19102, USA.
| |
Collapse
|
25
|
Transcriptional Programs Define Intratumoral Heterogeneity of Ewing Sarcoma at Single-Cell Resolution. Cell Rep 2020; 30:1767-1779.e6. [DOI: 10.1016/j.celrep.2020.01.049] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 10/07/2019] [Accepted: 01/15/2020] [Indexed: 12/16/2022] Open
|
26
|
Čuklina J, Pedrioli PGA, Aebersold R. Review of Batch Effects Prevention, Diagnostics, and Correction Approaches. Methods Mol Biol 2020; 2051:373-387. [PMID: 31552638 DOI: 10.1007/978-1-4939-9744-2_16] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Systematic technical variation in high-throughput studies consisting of the serial measurement of large sample cohorts is known as batch effects. Batch effects reduce the sensitivity of biological signal extraction and can cause significant artifacts. The systematic bias in the data caused by batch effects is more common in studies in which logistical considerations restrict the number of samples that can be prepared or profiled in a single experiment, thus necessitating the arrangement of subsets of study samples in batches. To mitigate the negative impact of batch effects, statistical approaches for batch correction are used at the stage of experimental design and data processing. Whereas in genomics batch effects and possible remedies have been extensively discussed, they are a relatively new challenge in proteomics because methods with sufficient throughput to systematically measure through large sample cohorts have only recently become available. Here we provide general recommendations to mitigate batch effects: we discuss the design of large-scale proteomic studies, review the most commonly used tools for batch effect correction and overview their application in proteomics.
Collapse
Affiliation(s)
- Jelena Čuklina
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
- Ph.D. Program in Systems Biology, University of Zurich and ETH Zurich, Zürich, Switzerland
| | - Patrick G A Pedrioli
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
- ETH Zürich, PHRT-MS, Zürich, Switzerland
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
- Faculty of Science, University of Zürich, Zürich, Switzerland.
| |
Collapse
|
27
|
Monakhova YB, Rutledge DN. Independent components analysis (ICA) at the "cocktail-party" in analytical chemistry. Talanta 2019; 208:120451. [PMID: 31816793 DOI: 10.1016/j.talanta.2019.120451] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 09/26/2019] [Accepted: 10/04/2019] [Indexed: 02/07/2023]
Abstract
Independent components analysis (ICA) is a probabilistic method, whose goal is to extract underlying component signals, that are maximally independent and non-Gaussian, from mixed observed signals. Since the data acquired in many applications in analytical chemistry are mixtures of component signals, such a method is of great interest. In this article recent ICA applications for quantitative and qualitative analysis in analytical chemistry are reviewed. The following experimental techniques are covered: fluorescence, UV-VIS, NMR, vibrational spectroscopies as well as chromatographic profiles. Furthermore, we reviewed ICA as a preprocessing tool as well as existing hybrid ICA-based multivariate approaches. Finally, further research directions are proposed. Our review shows that ICA is starting to play an important role in analytical chemistry, and this will definitely increase in the future.
Collapse
Affiliation(s)
- Yulia B Monakhova
- Spectral Service AG, Emil-Hoffmann-Straße 33, 50996, Cologne, Germany; Institute of Chemistry, Saratov State University, Astrakhanskaya Street 83, 410012, Saratov, Russia; Institute of Chemistry, Saint Petersburg State University, 13B Universitetskaya Emb., St Petersburg, 199034, Russia.
| | - Douglas N Rutledge
- UMR Ingénierie Procédés Aliments, AgroParisTech, INRA, Université Paris-Saclay, Massy, France; National Wine and Grape Industry Centre, Charles Sturt University, Wagga Wagga, Australia
| |
Collapse
|
28
|
Sompairac N, Nazarov PV, Czerwinska U, Cantini L, Biton A, Molkenov A, Zhumadilov Z, Barillot E, Radvanyi F, Gorban A, Kairov U, Zinovyev A. Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets. Int J Mol Sci 2019; 20:E4414. [PMID: 31500324 PMCID: PMC6771121 DOI: 10.3390/ijms20184414] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2019] [Revised: 09/02/2019] [Accepted: 09/04/2019] [Indexed: 12/13/2022] Open
Abstract
Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possible. Initially suggested for solving source blind separation problems in various fields, ICA was shown to be successful in analyzing functional magnetic resonance imaging (fMRI) and other types of biomedical data. In the last twenty years, ICA became a part of the standard machine learning toolbox, together with other matrix factorization methods such as principal component analysis (PCA) and non-negative matrix factorization (NMF). Here, we review a number of recent works where ICA was shown to be a useful tool for unraveling the complexity of cancer biology from the analysis of different types of omics data, mainly collected for tumoral samples. Such works highlight the use of ICA in dimensionality reduction, deconvolution, data pre-processing, meta-analysis, and others applied to different data types (transcriptome, methylome, proteome, single-cell data). We particularly focus on the technical aspects of ICA application in omics studies such as using different protocols, determining the optimal number of components, assessing and improving reproducibility of the ICA results, and comparison with other popular matrix factorization techniques. We discuss the emerging ICA applications to the integrative analysis of multi-level omics datasets and introduce a conceptual view on ICA as a tool for defining functional subsystems of a complex biological system and their interactions under various conditions. Our review is accompanied by a Jupyter notebook which illustrates the discussed concepts and provides a practical tool for applying ICA to the analysis of cancer omics datasets.
Collapse
Affiliation(s)
- Nicolas Sompairac
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
- Centre de Recherches Interdisciplinaires, Université Paris Descartes, 75004 Paris, France.
| | - Petr V Nazarov
- Multiomics Data Science Research Group, Quantitative Biology Unit, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg.
| | - Urszula Czerwinska
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| | - Laura Cantini
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR8197, INSERM U1024, Ecole Normale Supérieure, PSL Research University, 75005 Paris, France.
| | - Anne Biton
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 75015 Paris, France.
| | - Askhat Molkenov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Zhaxybay Zhumadilov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
- University Medical Center, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| | - Francois Radvanyi
- Institut Curie, PSL Research University, 75005 Paris, France.
- CNRS, UMR 144, 75248 Paris, France.
| | - Alexander Gorban
- Center for Mathematical Modeling, University of Leicester, Leicester LE1 7RH, UK.
- Lobachevsky University, 603022 Nizhny Novgorod, Russia.
| | - Ulykbek Kairov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| |
Collapse
|
29
|
Liu W, Payne SH, Ma S, Fenyö D. Extracting Pathway-level Signatures from Proteogenomic Data in Breast Cancer Using Independent Component Analysis. Mol Cell Proteomics 2019; 18:S169-S182. [PMID: 31213479 PMCID: PMC6692784 DOI: 10.1074/mcp.tir119.001442] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 06/01/2019] [Indexed: 02/03/2023] Open
Abstract
Recent advances in the multi-omics characterization necessitate knowledge integration across different data types that go beyond individual biomarker discovery. In this study, we apply independent component analysis (ICA) to human breast cancer proteogenomics data to retrieve mechanistic information. We show that as an unsupervised feature extraction method, ICA was able to construct signatures with known biological relevance on both transcriptome and proteome levels. Moreover, proteome and transcriptome signatures can be associated by their respective correlation with patient clinical features, providing an integrated description of phenotype-related biological processes. Our results demonstrate that the application of ICA to proteogenomics data could lead to pathway-level knowledge discovery. Potential extension of this approach to other data and cancer types may contribute to pan-cancer integration of multi-omics information.
Collapse
Affiliation(s)
- Wenke Liu
- Institute for System Genetics, NYU School of Medicine, New York, New York 10016; Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, New York 10016
| | - Samuel H Payne
- Biology Department, Brigham Young University, Provo, Utah 84602
| | - Sisi Ma
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota 55455.
| | - David Fenyö
- Institute for System Genetics, NYU School of Medicine, New York, New York 10016; Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, New York 10016.
| |
Collapse
|
30
|
The potential of MR-Encephalography for BCI/Neurofeedback applications with high temporal resolution. Neuroimage 2019; 194:228-243. [PMID: 30910728 DOI: 10.1016/j.neuroimage.2019.03.046] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Revised: 03/14/2019] [Accepted: 03/19/2019] [Indexed: 11/20/2022] Open
Abstract
Real-time functional magnetic resonance imaging (rt-fMRI) enables the update of various brain-activity measures during an ongoing experiment as soon as a new brain volume is acquired. However, the recorded Blood-oxygen-level dependent (BOLD) signal also contains physiological artifacts such as breathing and heartbeat, which potentially cause misleading false positive effects especially problematic in brain-computer interface (BCI) and neurofeedback (NF) setups. The low temporal resolution of echo planar imaging (EPI) sequences (which is in the range of seconds) prevents a proper separation of these artifacts from the BOLD signal. MR-Encephalography (MREG) has been shown to provide the high temporal resolution required to unalias and correct for physiological fluctuations and leads to increased specificity and sensitivity for mapping task-based activation and functional connectivity as well as for detecting dynamic changes in connectivity over time. By comparing a simultaneous multislice echo planar imaging (SMS-EPI) sequence and an MREG sequence using the same nominal spatial resolution in an offline analysis for three different experimental fMRI paradigms (perception of house and face stimuli, motor imagery, Stroop task), the potential of this novel technique for future BCI and NF applications was investigated. First, adapted general linear model pre-whitening which accounts for the high temporal resolution in MREG was implemented to calculate proper statistical results and be able to compare these with the SMS-EPI sequence. Furthermore, the respiration- and cardiac pulsation-related signals were successfully separated from the MREG signal using independent component analysis which were then included as regressors for a GLM analysis. Only the MREG sequence allowed to clearly separate cardiac pulsation and respiration components from the signal time course. It could be shown that these components highly correlate with the recorded respiration and cardiac pulsation signals using a respiratory belt and fingertip pulse plethysmograph. Temporal signal-to-noise ratios of SMS-EPI and MREG were comparable. Functional connectivity analysis using partial correlation showed a reduced standard error in MREG compared to SMS-EPI. Also, direct time course comparisons by down-sampling the MREG signal to the SMS-EPI temporal resolution showed lower variance in MREG. In general, we show that the higher temporal resolution is beneficial for fMRI time course modeling and this aspect can be exploited in offline application but also, is especially attractive, for real-time BCI and NF applications.
Collapse
|
31
|
Molecular Inverse Comorbidity between Alzheimer's Disease and Lung Cancer: New Insights from Matrix Factorization. Int J Mol Sci 2019; 20:ijms20133114. [PMID: 31247897 PMCID: PMC6650839 DOI: 10.3390/ijms20133114] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/13/2019] [Accepted: 06/18/2019] [Indexed: 12/23/2022] Open
Abstract
Matrix factorization (MF) is an established paradigm for large-scale biological data analysis with tremendous potential in computational biology. Here, we challenge MF in depicting the molecular bases of epidemiologically described disease–disease (DD) relationships. As a use case, we focus on the inverse comorbidity association between Alzheimer’s disease (AD) and lung cancer (LC), described as a lower than expected probability of developing LC in AD patients. To this day, the molecular mechanisms underlying DD relationships remain poorly explained and their better characterization might offer unprecedented clinical opportunities. To this goal, we extend our previously designed MF-based framework for the molecular characterization of DD relationships. Considering AD–LC inverse comorbidity as a case study, we highlight multiple molecular mechanisms, among which we confirm the involvement of processes related to the immune system and mitochondrial metabolism. We then distinguish mechanisms specific to LC from those shared with other cancers through a pan-cancer analysis. Additionally, new candidate molecular players, such as estrogen receptor (ER), cadherin 1 (CDH1) and histone deacetylase (HDAC), are pinpointed as factors that might underlie the inverse relationship, opening the way to new investigations. Finally, some lung cancer subtype-specific factors are also detected, also suggesting the existence of heterogeneity across patients in the context of inverse comorbidity.
Collapse
|
32
|
Dhifli W, Puig J, Dispot A, Elati M. Latent network-based representations for large-scale gene expression data analysis. BMC Bioinformatics 2019; 19:466. [PMID: 30717663 PMCID: PMC7394327 DOI: 10.1186/s12859-018-2481-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Accepted: 11/09/2018] [Indexed: 12/12/2022] Open
Abstract
Background With the recent advancements in high-throughput experimental procedures, biologists are gathering huge quantities of data. A main priority in bioinformatics and computational biology is to provide system level analytical tools capable of meeting an ever-growing production of high-throughput biological data while taking into account its biological context. In gene expression data analysis, genes have widely been considered as independent components. However, a systemic view shows that they act synergistically in living cells, forming functional complexes and more generally a biological system. Results In this paper, we propose LatNet, a signal transformation framework that, starting from an initial large-scale gene expression data, allows to generate new representations based on latent network-based relationships between the genes. LatNet aims to leverage system level relations between the genes as an underlying hidden structure to derive the new transformed latent signals. We present a concrete implementation of our framework, based on a gene regulatory network structure and two signal transformation approaches, to quantify latent network-based activity of regulators, as well as gene perturbation signals. The new gene/regulator signals are at the level of each sample of the input data and, thus, could directly be used instead of the initial expression signals for major bioinformatics analysis, including diagnosis and personalized medicine. Conclusion Multiple patterns could be hidden or weakly observed in expression data. LatNet helps in uncovering latent signals that could emphasize hidden patterns based on the relations between the genes and, thus, enhancing the performance of gene expression-based analysis algorithms. We use LatNet for the analysis of real-world gene expression data of bladder cancer and we show the efficiency of our transformation framework as compared to using the initial expression data.
Collapse
Affiliation(s)
- Wajdi Dhifli
- University of Lille, 42, rue Paul Duez, Lille, 59000, France
| | - Julia Puig
- University of Lille, 42, rue Paul Duez, Lille, 59000, France
| | - Aurélien Dispot
- University of Lille, 42, rue Paul Duez, Lille, 59000, France
| | - Mohamed Elati
- University of Lille, 42, rue Paul Duez, Lille, 59000, France. .,UMR 8030 ; Génomique Métabolique / Laboratoire iSSB ; CEA-CNRS-UEVE, Genopole campus 1, 5 rue Henri Desbruères, Évry, 91030 Cedex, France.
| |
Collapse
|