Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang H, Lee CAA, Li Z, Garbe JR, Eide CR, Petegrosso R, Kuang R, Tolar J. A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa. PLoS Comput Biol 2018;14:e1006053. [PMID: 29630593 PMCID: PMC5908193 DOI: 10.1371/journal.pcbi.1006053] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 04/19/2018] [Accepted: 02/21/2018] [Indexed: 12/31/2022] Open

For:	Zhang H, Lee CAA, Li Z, Garbe JR, Eide CR, Petegrosso R, Kuang R, Tolar J. A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa. PLoS Comput Biol 2018;14:e1006053. [PMID: 29630593 PMCID: PMC5908193 DOI: 10.1371/journal.pcbi.1006053] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 04/19/2018] [Accepted: 02/21/2018] [Indexed: 12/31/2022] Open

Number

Cited by Other Article(s)

Lee CAA, Wu S, Chow YT, Kofman E, Williams V, Riddle M, Eide C, Ebens CL, Frank MH, Tolar J, Hook KP, AlDubayan SH, Frank NY. Accelerated Aging and Microsatellite Instability in Recessive Dystrophic Epidermolysis Bullosa-Associated Cutaneous Squamous Cell Carcinoma. J Invest Dermatol 2024:S0022-202X(24)00022-8. [PMID: 38272206 DOI: 10.1016/j.jid.2023.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/22/2023] [Accepted: 11/06/2023] [Indexed: 01/27/2024]

Affiliation(s)

Catherine A A Lee Division of Genetics, Department of Medicine, Brigham & Women's Hospital, Boston, Massachusetts, USA; Harvard Medical School, Boston, Massachusetts, USA; Transplant Research Program, Division of Nephrology, Boston Children's Hospital, Boston, Massachusetts, USA
Siyuan Wu Division of Genetics, Department of Medicine, Brigham & Women's Hospital, Boston, Massachusetts, USA; Harvard Medical School, Boston, Massachusetts, USA; Transplant Research Program, Division of Nephrology, Boston Children's Hospital, Boston, Massachusetts, USA
Yuen Ting Chow Division of Genetics, Department of Medicine, Brigham & Women's Hospital, Boston, Massachusetts, USA
Eric Kofman Division of Genetics, Department of Medicine, Brigham & Women's Hospital, Boston, Massachusetts, USA; Broad Institute, Cambridge, Massachusetts, USA
Valencia Williams Division of Pediatric Blood and Marrow Transplantation & Cellular Therapy, Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA
Megan Riddle Division of Pediatric Blood and Marrow Transplantation & Cellular Therapy, Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA
Cindy Eide Division of Pediatric Blood and Marrow Transplantation & Cellular Therapy, Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA
Christen L Ebens Division of Pediatric Blood and Marrow Transplantation & Cellular Therapy, Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA
Markus H Frank Harvard Medical School, Boston, Massachusetts, USA; Transplant Research Program, Division of Nephrology, Boston Children's Hospital, Boston, Massachusetts, USA; Harvard Stem Cell Institute, Harvard University, Cambridge, Massachusetts, USA; Department of Dermatology, Brigham & Women's Hospital, Boston, Massachusetts, USA
Jakub Tolar Division of Pediatric Blood and Marrow Transplantation & Cellular Therapy, Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA; Medical School, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA; Stem Cell Institute, Medical School, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA
Kristen P Hook Department of Dermatology, Medical School, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA
Saud H AlDubayan Division of Genetics, Department of Medicine, Brigham & Women's Hospital, Boston, Massachusetts, USA; Broad Institute, Cambridge, Massachusetts, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA; Department of Medicine, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
Natasha Y Frank Division of Genetics, Department of Medicine, Brigham & Women's Hospital, Boston, Massachusetts, USA; Harvard Medical School, Boston, Massachusetts, USA; Transplant Research Program, Division of Nephrology, Boston Children's Hospital, Boston, Massachusetts, USA; Department of Medicine, VA Boston Healthcare System, West Roxbury, Massachusetts, USA.

Collapse

Zou X, Liu Y, Wang M, Zou J, Shi Y, Su X, Xu J, Tong HHY, Ji Y, Gui L, Hao J. scCURE identifies cell types responding to immunotherapy and enables outcome prediction. CELL REPORTS METHODS 2023;3:100643. [PMID: 37989083 PMCID: PMC10694528 DOI: 10.1016/j.crmeth.2023.100643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 07/17/2023] [Accepted: 10/23/2023] [Indexed: 11/23/2023]

Multi-Objective Genetic Algorithm for Cluster Analysis of Single-Cell Transcriptomes. J Pers Med 2023;13:jpm13020183. [PMID: 36836417 PMCID: PMC9960600 DOI: 10.3390/jpm13020183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 01/15/2023] [Accepted: 01/16/2023] [Indexed: 01/22/2023] Open

Su M, Pan T, Chen QZ, Zhou WW, Gong Y, Xu G, Yan HY, Li S, Shi QZ, Zhang Y, He X, Jiang CJ, Fan SC, Li X, Cairns MJ, Wang X, Li YS. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Mil Med Res 2022;9:68. [PMID: 36461064 PMCID: PMC9716519 DOI: 10.1186/s40779-022-00434-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open

Affiliation(s)

Min Su State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Tao Pan College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Qiu-Zhen Chen State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Wei-Wei Zhou College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
Yi Gong State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China.,Department of Immunology, Nanjing Medical University, Nanjing, 211166, China
Gang Xu College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Huan-Yu Yan State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Si Li College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Qiao-Zhen Shi State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Ya Zhang College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Xiao He Department of Laboratory Medicine, Women and Children's Hospital of Chongqing Medical University, Chongqing, 401174, China
Chun-Jie Jiang Baylor College of Medicine, Houston, TX, 77030, USA
Shi-Cai Fan Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110, Guangdong, China
Xia Li College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
Murray J Cairns School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, the University of Newcastle, University Drive, Callaghan, NSW, 2308, Australia. .,Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW, 2305, Australia.
Xi Wang State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China.
Yong-Sheng Li College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China.

Collapse

Zeng Y, Wei Z, Zhong F, Pan Z, Lu Y, Yang Y. A parameter-free deep embedded clustering method for single-cell RNA-seq data. Brief Bioinform 2022;23:6582003. [PMID: 35524494 DOI: 10.1093/bib/bbac172] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 03/25/2022] [Accepted: 04/18/2022] [Indexed: 11/12/2022] Open

Upadhyay P, Ray S. A Regularized Multi-Task Learning Approach for Cell Type Detection in Single-Cell RNA Sequencing Data. Front Genet 2022;13:788832. [PMID: 35495159 PMCID: PMC9043858 DOI: 10.3389/fgene.2022.788832] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 02/16/2022] [Indexed: 11/29/2022] Open

Xie B, Jiang Q, Mora A, Li X. Automatic cell type identification methods for single-cell RNA sequencing. Comput Struct Biotechnol J 2021;19:5874-5887. [PMID: 34815832 PMCID: PMC8572862 DOI: 10.1016/j.csbj.2021.10.027] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 09/23/2021] [Accepted: 10/18/2021] [Indexed: 11/24/2022] Open

Kerner J, Dogan A, von Recum H. Machine learning and big data provide crucial insight for future biomaterials discovery and research. Acta Biomater 2021;130:54-65. [PMID: 34087445 DOI: 10.1016/j.actbio.2021.05.053] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 02/06/2023]

Abstract

Machine learning have been widely adopted in a variety of fields including engineering, science, and medicine revolutionizing how data is collected, used, and stored. Their implementation has led to a drastic increase in the number of computational models for the prediction of various numerical, categorical, or association events given input variables. We aim to examine recent advances in the use of machine learning when applied to the biomaterial field. Specifically, quantitative structure properties relationships offer the unique ability to correlate microscale molecular descriptors to larger macroscale material properties. These new models can be broken down further into four categories: regression, classification, association, and clustering. We examine recent approaches and new uses of machine learning in the three major categories of biomaterials: metals, polymers, and ceramics for rapid property prediction and trend identification. While current research is promising, limitations in the form of lack of standardized reporting and available databases complicates the implementation of described models. Herein, we hope to provide a snapshot of the current state of the field and a beginner's guide to navigating the intersection of biomaterials research and machine learning. STATEMENT OF SIGNIFICANCE: Machine learning and its methods have found a variety of uses beyond the field of computer science but have largely been neglected by those in realm of biomaterials. Through the use of more computational methods, biomaterials development can be expediated while reducing the need for standard trial and error methods. Within, we introduce four basic models that readers can potentially apply to their current research as well as current applications within the field. Furthermore, we hope that this article may act as a "call to action" for readers to realize and address the current lack of implementation within the biomaterials field.

Collapse

coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data. PLoS Comput Biol 2021;17:e1009064. [PMID: 34077420 PMCID: PMC8202939 DOI: 10.1371/journal.pcbi.1009064] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Revised: 06/14/2021] [Accepted: 05/11/2021] [Indexed: 12/02/2022] Open

Abstract

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.

The recent advances in single-cell technologies have enabled multiple biological layers to be probed and provides unprecedented opportunities to assay cellular heterogeneity. To analyze the complex biological processes varying across cells, we need to obtain and integrate different types of genomic features through flexible but rigorous computational methods. The most important challenge for data integration is to link data from different sources in a way that is biologically meaningful. In this work, we have developed a transfer learning method based on the information-theoretic co-clustering framework for the integrative analysis of single-cell genomic data. This method utilizes the information from one dataset to boost the analysis of another dataset, and it also uses the information of the features that are unlinked in the two datasets. We demonstrate that our transfer learning-based clustering method significantly improves clustering performance in single-cell genomic datasets. Our results show that transfer learning is promising for the integrative analysis of single-cell genomic data.

Collapse

Nayak R, Hasija Y. A hitchhiker's guide to single-cell transcriptomics and data analysis pipelines. Genomics 2021;113:606-619. [PMID: 33485955 DOI: 10.1016/j.ygeno.2021.01.007] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 12/30/2020] [Accepted: 01/18/2021] [Indexed: 12/20/2022]

Zeng P, Wangwu J, Lin Z. Coupled co-clustering-based unsupervised transfer learning for the integrative analysis of single-cell genomic data. Brief Bioinform 2020;22:6024740. [PMID: 33279962 DOI: 10.1093/bib/bbaa347] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 10/29/2020] [Accepted: 10/30/2020] [Indexed: 12/11/2022] Open

Jiang J, Faiz A, Berg M, Carpaij OA, Vermeulen CJ, Brouwer S, Hesse L, Teichmann SA, ten Hacken NHT, Timens W, van den Berge M, Nawijn MC. Gene signatures from scRNA-seq accurately quantify mast cells in biopsies in asthma. Clin Exp Allergy 2020;50:1428-1431. [PMID: 32935368 PMCID: PMC7756890 DOI: 10.1111/cea.13732] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 08/31/2020] [Accepted: 09/07/2020] [Indexed: 01/02/2023]

Affiliation(s)

Jian Jiang Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of Pathology and Medical BiologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Alen Faiz Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of PulmonologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands Respiratory Bioinformatics and Molecular Biology (RBMB)Faculty of ScienceUniversity of Technology SydneyUltimoNSWAustralia
Marijn Berg Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of Pathology and Medical BiologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Orestes A. Carpaij Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of PulmonologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Corneel J. Vermeulen Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of PulmonologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Sharon Brouwer Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of Pathology and Medical BiologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Laura Hesse Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of Pathology and Medical BiologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Sarah A. Teichmann Wellcome Sanger InstituteWellcome Genome CampusCambridgeUK Open TargetsWellcome Genome CampusCambridgeUK Theory of Condensed Matter GroupCavendish Laboratory/Dept PhysicsUniversity of CambridgeCambridgeUK
Nick H. T. ten Hacken Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of PulmonologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Wim Timens Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of Pathology and Medical BiologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Maarten van den Berge Groningen Research Institute for Asthma and COPD (GRIAC)University of GroningenGroningenThe Netherlands Department of PulmonologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands
Martijin C. Nawijn Department of PulmonologyUniversity Medical Center GroningenUniversity of GroningenGroningenThe Netherlands Wellcome Sanger InstituteWellcome Genome CampusCambridgeUK

Collapse

Ye P, Ye W, Ye C, Li S, Ye L, Ji G, Wu X. scHinter: imputing dropout events for single-cell RNA-seq data with limited sample size. Bioinformatics 2020;36:789-797. [PMID: 31392316 DOI: 10.1093/bioinformatics/btz627] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 07/18/2019] [Accepted: 08/06/2019] [Indexed: 01/18/2023] Open

Abstract

MOTIVATION

Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes.

RESULTS

We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth.

AVAILABILITY AND IMPLEMENTATION

Freely available for download at https://github.com/BMILAB/scHinter.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Peng L, Tian X, Tian G, Xu J, Huang X, Weng Y, Yang J, Zhou L. Single-cell RNA-seq clustering: datasets, models, and algorithms. RNA Biol 2020;17:765-783. [PMID: 32116127 PMCID: PMC7549635 DOI: 10.1080/15476286.2020.1728961] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 01/10/2020] [Accepted: 01/11/2020] [Indexed: 12/13/2022] Open

Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020;21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 534] [Impact Index Per Article: 133.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open

Affiliation(s)

David Lähnemann Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Johannes Köster Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
Ewa Szczurek Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
Davis J. McCarthy Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
Stephanie C. Hicks Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
Mark D. Robinson Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
Catalina A. Vallejos MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK The Alan Turing Institute, British Library, London, UK
Kieran R. Campbell Department of Statistics, University of British Columbia, Vancouver, Canada Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada Data Science Institute, University of British Columbia, Vancouver, Canada
Niko Beerenwinkel Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
Ahmed Mahfouz Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
Luca Pinello Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA Department of Pathology, Harvard Medical School, Boston, USA Broad Institute of Harvard and MIT, Cambridge, MA USA
Pavel Skums Department of Computer Science, Georgia State University, Atlanta, USA
Alexandros Stamatakis Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Camille Stephan-Otto Attolini Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Spain
Samuel Aparicio Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
Jasmijn Baaijens Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Marleen Balvert Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
Buys de Barbanson Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
Antonio Cappuccio Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
Giacomo Corleone Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
Bas E. Dutilh Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
Maria Florescu Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
Victor Guryev European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
Rens Holmer Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Katharina Jahn Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
Thamar Jessurun Lobo European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
Emma M. Keizer Biometris, Wageningen University & Research, Wageningen, The Netherlands
Indu Khatri Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
Szymon M. Kielbasa Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
Jan O. Korbel Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
Alexey M. Kozlov Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
Tzu-Hao Kuo Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Boudewijn P.F. Lelieveldt PRB lab, Delft University of Technology, Delft, The Netherlands Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Ion I. Mandoiu Computer Science & Engineering Department, University of Connecticut, Storrs, USA
John C. Marioni Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
Tobias Marschall Center for Bioinformatics, Saarland University, Saarbrücken, Germany Max Planck Institute for Informatics, Saarbrücken, Germany
Felix Mölder Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
Amir Niknejad Computation molecular design, Zuse Institute Berlin, Berlin, Germany Mathematics Department, Mount Saint Vincent, New York, USA
Alicja Rączkowska Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
Marcel Reinders Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
Jeroen de Ridder Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands
Antoine-Emmanuel Saliba Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
Antonios Somarakis Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Oliver Stegle Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
Fabian J. Theis Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
Huan Yang Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
Alex Zelikovsky Department of Computer Science, Georgia State University, Atlanta, USA The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
Alice C. McHardy Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Benjamin J. Raphael Department of Computer Science, Princeton University, Princeton, USA
Sohrab P. Shah Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
Alexander Schönhuth Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands

Collapse

Mieth B, Hockley JRF, Görnitz N, Vidovic MMC, Müller KR, Gutteridge A, Ziemek D. Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data. Sci Rep 2019;9:20353. [PMID: 31889137 PMCID: PMC6937257 DOI: 10.1038/s41598-019-56911-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 12/13/2019] [Indexed: 01/21/2023] Open

Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019;50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 210] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]

Zeng T, Dai H. Single-Cell RNA Sequencing-Based Computational Analysis to Describe Disease Heterogeneity. Front Genet 2019;10:629. [PMID: 31354786 PMCID: PMC6640157 DOI: 10.3389/fgene.2019.00629] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 06/17/2019] [Indexed: 12/25/2022] Open

Petegrosso R, Li Z, Kuang R. Machine learning and statistical methods for clustering single-cell RNA-sequencing data. Brief Bioinform 2019;21:1209-1223. [DOI: 10.1093/bib/bbz063] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 04/04/2019] [Accepted: 04/29/2019] [Indexed: 01/08/2023] Open

Abstract Abstract Single-cell RNAsequencing (scRNA-seq) technologies have enabled the large-scale whole-transcriptome profiling of each individual single cell in a cell population. A core analysis of the scRNA-seq transcriptome profiles is to cluster the single cells to reveal cell subtypes and infer cell lineages based on the relations among the cells. This article reviews the machine learning and statistical methods for clustering scRNA-seq transcriptomes developed in the past few years. The review focuses on how conventional clustering techniques such as hierarchical clustering, graph-based clustering, mixture models, $k$-means, ensemble learning, neural networks and density-based clustering are modified or customized to tackle the unique challenges in scRNA-seq data analysis, such as the dropout of low-expression genes, low and uneven read coverage of transcripts, highly variable total mRNAs from single cells and ambiguous cell markers in the presence of technical biases and irrelevant confounding biological variations. We review how cell-specific normalization, the imputation of dropouts and dimension reduction methods can be applied with new statistical or optimization strategies to improve the clustering of single cells. We will also introduce those more advanced approaches to cluster scRNA-seq transcriptomes in time series data and multiple cell populations and to detect rare cell types. Several software packages developed to support the cluster analysis of scRNA-seq data are also reviewed and experimentally compared to evaluate their performance and efficiency. Finally, we conclude with useful observations and possible future directions in scRNA-seq data analytics. Availability All the source code and data are available at https://github.com/kuanglab/single-cell-review. Collapse

Ye W, Ji G, Ye P, Long Y, Xiao X, Li S, Su Y, Wu X. scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data. BMC Genomics 2019;20:347. [PMID: 31068142 PMCID: PMC6505295 DOI: 10.1186/s12864-019-5747-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/29/2019] [Indexed: 12/15/2022] Open

Abstract

Background

Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data.

Results

We presented scNPF, an integrative scRNA-seq preprocessing framework assisted by network propagation and network fusion, for recovering gene expression loss, correcting gene expression measurements, and learning similarities between cells. scNPF leverages the context-specific topology inherent in the given data and the priori knowledge derived from publicly available molecular gene-gene interaction networks to augment gene-gene relationships in a data driven manner. We have demonstrated the great potential of scNPF in scRNA-seq preprocessing for accurately recovering gene expression values and learning cell similarity networks. Comprehensive evaluation of scNPF across a wide spectrum of scRNA-seq data sets showed that scNPF achieved comparable or higher performance than the competing approaches according to various metrics of internal validation and clustering accuracy. We have made scNPF an easy-to-use R package, which can be used as a versatile preprocessing plug-in for most existing scRNA-seq analysis pipelines or tools.

Conclusions

scNPF is a universal tool for preprocessing of scRNA-seq data, which jointly incorporates the global topology of priori interaction networks and the context-specific information encapsulated in the scRNA-seq data to capture both shared and complementary knowledge from diverse data sources. scNPF could be used to recover gene signatures and learn cell-to-cell similarities from emerging scRNA-seq data to facilitate downstream analyses such as dimension reduction, cell type clustering, and visualization.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5747-5) contains supplementary material, which is available to authorized users.

Collapse

Majority Voting Based Multi-Task Clustering of Air Quality Monitoring Network in Turkey. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9081610] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Li X, Zhang S, Wong KC. Single-cell RNA-seq interpretations using evolutionary multiobjective ensemble pruning. Bioinformatics 2018;35:2809-2817. [DOI: 10.1093/bioinformatics/bty1056] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 10/31/2018] [Accepted: 12/21/2018] [Indexed: 11/14/2022] Open

Abstract Abstract Motivation In recent years, single-cell RNA sequencing enables us to discover cell types or even subtypes. Its increasing availability provides opportunities to identify cell populations from single-cell RNA-seq data. Computational methods have been employed to reveal the gene expression variations among multiple cell populations. Unfortunately, the existing ones can suffer from realistic restrictions such as experimental noises, numerical instability, high dimensionality and computational scalability. Results We propose an evolutionary multiobjective ensemble pruning algorithm (EMEP) that addresses those realistic restrictions. Our EMEP algorithm first applies the unsupervised dimensionality reduction to project data from the original high dimensions to low-dimensional subspaces; basic clustering algorithms are applied in those new subspaces to generate different clustering results to form cluster ensembles. However, most of those cluster ensembles are unnecessarily bulky with the expense of extra time costs and memory consumption. To overcome that problem, EMEP is designed to dynamically select the suitable clustering results from the ensembles. Moreover, to guide the multiobjective ensemble evolution, three cluster validity indices including the overall cluster deviation, the within-cluster compactness and the number of basic partition clusters are formulated as the objective functions to unleash its cell type discovery performance using evolutionary multiobjective optimization. We applied EMEP to 55 simulated datasets and seven real single-cell RNA-seq datasets, including six single-cell RNA-seq dataset and one large-scale dataset with 3005 cells and 4412 genes. Two case studies are also conducted to reveal mechanistic insights into the biological relevance of EMEP. We found that EMEP can achieve superior performance over the other clustering algorithms, demonstrating that EMEP can identify cell populations clearly. Availability and implementation EMEP is written in Matlab and available at https://github.com/lixt314/EMEP Supplementary information Supplementary data are available at Bioinformatics online. Collapse