51
|
Abstract
Epigenome regulation has emerged as an important mechanism for the maintenance of organ function in health and disease. Dissecting epigenomic alterations and resultant gene expression changes in single cells provides unprecedented resolution and insight into cellular diversity, modes of gene regulation, transcription factor dynamics and 3D genome organization. In this chapter, we summarize the transformative single-cell epigenomic technologies that have deepened our understanding of the fundamental principles of gene regulation. We provide a historical perspective of these methods, brief procedural outline with emphasis on the computational tools used to meaningfully dissect information. Our overall goal is to aid scientists using these technologies in their favorite system of interest.
Collapse
Affiliation(s)
- Krystyna Mazan-Mamczarz
- Laboratory of Genetics and Genomics, National Institute on Aging (NIA), Intramural Research Program (IRP), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Jisu Ha
- Laboratory of Genetics and Genomics, National Institute on Aging (NIA), Intramural Research Program (IRP), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Supriyo De
- Laboratory of Genetics and Genomics, National Institute on Aging (NIA), Intramural Research Program (IRP), National Institutes of Health (NIH), Baltimore, MD, USA
- Laboratory of Genetics and Genomics, and Computational Biology and Genomics Core, National Institute on Aging-Intramural Research Program, National Institute of Health, Baltimore, MD, USA
| | - Payel Sen
- Laboratory of Genetics and Genomics, National Institute on Aging (NIA), Intramural Research Program (IRP), National Institutes of Health (NIH), Baltimore, MD, USA.
| |
Collapse
|
52
|
Predictive Biomarkers for Postmyocardial Infarction Heart Failure Using Machine Learning: A Secondary Analysis of a Cohort Study. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2021; 2021:2903543. [PMID: 34938340 PMCID: PMC8687817 DOI: 10.1155/2021/2903543] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 11/17/2021] [Indexed: 12/13/2022]
Abstract
Background There are few biomarkers with an excellent predictive value for postacute myocardial infarction (MI) patients who developed heart failure (HF). This study aimed to screen candidate biomarkers to predict post-MI HF. Methods This is a secondary analysis of a single-center cohort study including nine post-MI HF patients and eight post-MI patients who remained HF-free over a 6-month follow-up. Transcriptional profiling was analyzed using the whole blood samples collected at admission, discharge, and 1-month follow-up. We screened differentially expressed genes and identified key modules using weighted gene coexpression network analysis. We confirmed the candidate biomarkers using the developed external datasets on post-MI HF. The receiver operating characteristic curves were created to evaluate the predictive value of these candidate biomarkers. Results A total of 6,778, 1,136, and 1,974 genes (dataset 1) were differently expressed at admission, discharge, and 1-month follow-up, respectively. The white and royal blue modules were most significantly correlated with post-MI HF (dataset 2). After overlapping dataset 1, dataset 2, and external datasets (dataset 3), we identified five candidate biomarkers, including FCGR2A, GSDMB, MIR330, MED1, and SQSTM1. When GSDMB and SQSTM1 were combined, the area under the curve achieved 1.00, 0.85, and 0.89 in admission, discharge, and 1-month follow-up, respectively. Conclusions This study demonstrates that FCGR2A, GSDMB, MIR330, MED1, and SQSTM1 are the candidate predictive biomarker genes for post-MI HF, and the combination of GSDMB and SQSTM1 has a high predictive value.
Collapse
|
53
|
Sarkar S, Dey U, Khohliwe TB, Yella VR, Kumar A. Analysis of nucleoid-associated protein-binding regions reveals DNA structural features influencing genome organization in Mycobacterium tuberculosis. FEBS Lett 2021; 595:2504-2521. [PMID: 34387867 DOI: 10.1002/1873-3468.14178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/01/2021] [Accepted: 08/11/2021] [Indexed: 11/10/2022]
Abstract
Nucleoid-associated proteins (NAPs) maintain bacterial nucleoid configuration through their architectural properties of DNA bending, wrapping, and bridging. However, the contribution of DNA structural alterations to DNA-NAP recognition at the genomic scale remains unresolved. Present work dissects the DNA sequence, shape and altered structural preferences at a genomic scale for six NAPs in Mycobacterium tuberculosis. Results suggest narrower minor groove width (MGW) and higher DNA rigidity are marked for the binding sites of EspR and Lsr2, while mIHF, MtHU and NapM have heterogeneous DNA structural predilections. In contrast, WhiB4-DNA-binding sites were characterized by wider MGW, highly deformable and less curved DNA. This work provides systematic insight into NAP-mediated genome organization as a function of DNA structural features.
Collapse
Affiliation(s)
- Sharmilee Sarkar
- Department of Molecular Biology and Biotechnology, Tezpur University, India
| | - Upalabdha Dey
- Department of Molecular Biology and Biotechnology, Tezpur University, India
| | | | - Venkata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, India
| | - Aditya Kumar
- Department of Molecular Biology and Biotechnology, Tezpur University, India
| |
Collapse
|
54
|
Chen Y, Zhang Y, Li JYH, Ouyang Z. LISA2: Learning Complex Single-Cell Trajectory and Expression Trends. Front Genet 2021; 12:681206. [PMID: 34512717 PMCID: PMC8428276 DOI: 10.3389/fgene.2021.681206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 06/01/2021] [Indexed: 12/20/2022] Open
Abstract
Single-cell transcriptional and epigenomics profiles have been applied in a variety of tissues and diseases for discovering new cell types, differentiation trajectories, and gene regulatory networks. Many methods such as Monocle 2/3, URD, and STREAM have been developed for tree-based trajectory building. Here, we propose a fast and flexible trajectory learning method, LISA2, for single-cell data analysis. This new method has two distinctive features: (1) LISA2 utilizes specified leaves and root to reduce the complexity for building the developmental trajectory, especially for some special cases such as rare cell populations and adjacent terminal cell states; and (2) LISA2 is applicable for both transcriptomics and epigenomics data. LISA2 visualizes complex trajectories using 3D Landmark ISOmetric feature MAPping (L-ISOMAP). We apply LISA2 to simulation and real datasets in cerebellum, diencephalon, and hematopoietic stem cells including both single-cell transcriptomics data and single-cell assay for transposase-accessible chromatin data. LISA2 is efficient in estimating single-cell trajectory and expression trends for different kinds of molecular state of cells.
Collapse
Affiliation(s)
- Yang Chen
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, United States
| | - Yuping Zhang
- Department of Statistics, University of Connecticut, Storrs, CT, United States
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States
| | - James Y. H. Li
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut, Farmington, CT, United States
| | - Zhengqing Ouyang
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, United States
| |
Collapse
|
55
|
Koch FC, Sutton GJ, Voineagu I, Vafaee F. Supervised application of internal validation measures to benchmark dimensionality reduction methods in scRNA-seq data. Brief Bioinform 2021; 22:6347204. [PMID: 34374742 DOI: 10.1093/bib/bbab304] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/15/2021] [Accepted: 07/17/2021] [Indexed: 12/24/2022] Open
Abstract
A typical single-cell RNA sequencing (scRNA-seq) experiment will measure on the order of 20 000 transcripts and thousands, if not millions, of cells. The high dimensionality of such data presents serious complications for traditional data analysis methods and, as such, methods to reduce dimensionality play an integral role in many analysis pipelines. However, few studies have benchmarked the performance of these methods on scRNA-seq data, with existing comparisons assessing performance via downstream analysis accuracy measures, which may confound the interpretation of their results. Here, we present the most comprehensive benchmark of dimensionality reduction methods in scRNA-seq data to date, utilizing over 300 000 compute hours to assess the performance of over 25 000 low-dimension embeddings across 33 dimensionality reduction methods and 55 scRNA-seq datasets. We employ a simple, yet novel, approach, which does not rely on the results of downstream analyses. Internal validation measures (IVMs), traditionally used as an unsupervised method to assess clustering performance, are repurposed to measure how well-formed biological clusters are after dimensionality reduction. Performance was further evaluated over nearly 200 000 000 iterations of DBSCAN, a density-based clustering algorithm, showing that hyperparameter optimization using IVMs as the objective function leads to near-optimal clustering. Methods were also assessed on the extent to which they preserve the global structure of the data, and on their computational memory and time requirements across a large range of sample sizes. Our comprehensive benchmarking analysis provides a valuable resource for researchers and aims to guide best practice for dimensionality reduction in scRNA-seq analyses, and we highlight Latent Dirichlet Allocation and Potential of Heat-diffusion for Affinity-based Transition Embedding as high-performing algorithms.
Collapse
Affiliation(s)
- Forrest C Koch
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia
| | - Gavin J Sutton
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia
| | - Irina Voineagu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia.,UNSW Data Science Hub, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia.,UNSW Data Science Hub, University of New South Wales (UNSW Sydney), Sydney, NSW, Australia
| |
Collapse
|
56
|
Mondal PK, Saha US, Mukhopadhyay I. PseudoGA: cell pseudotime reconstruction based on genetic algorithm. Nucleic Acids Res 2021; 49:7909-7924. [PMID: 34244782 PMCID: PMC8661435 DOI: 10.1093/nar/gkab457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 05/03/2021] [Accepted: 07/07/2021] [Indexed: 01/05/2023] Open
Abstract
Dynamic regulation of gene expression is often governed by progression through transient cell states. Bulk RNA-seq analysis can only detect average change in expression levels and is unable to identify this dynamics. Single cell RNA-seq presents an unprecedented opportunity that helps in placing the cells on a hypothetical time trajectory that reflects gradual transition of their transcriptomes. This continuum trajectory or ‘pseudotime’, may reveal the developmental pathway and provide us with information on dynamic transcriptomic changes and other biological processes. Existing approaches to build pseudotime heavily depend on reducing huge dimension to extremely low dimensional subspaces and may lead to loss of information. We propose PseudoGA, a genetic algorithm based approach to order cells assuming that gene expressions vary according to a smooth curve along the pseudotime trajectory. We observe superior accuracy of our method in simulated as well as benchmarking real datasets. Generality of the assumption behind PseudoGA and no dependence on dimensionality reduction technique make it a robust choice for pseudotime estimation from single cell transcriptome data. PseudoGA is also time efficient when applied to a large single cell RNA-seq data and adaptable to parallel computing. R code for PseudoGA is freely available at https://github.com/indranillab/pseudoga.
Collapse
Affiliation(s)
- Pronoy Kanti Mondal
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| | - Udit Surya Saha
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| | - Indranil Mukhopadhyay
- Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India
| |
Collapse
|
57
|
Cui Y, Zhang S, Liang Y, Wang X, Ferraro TN, Chen Y. Consensus clustering of single-cell RNA-seq data by enhancing network affinity. Brief Bioinform 2021; 22:6308199. [PMID: 34160582 PMCID: PMC8574980 DOI: 10.1093/bib/bbab236] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 05/29/2021] [Accepted: 06/01/2021] [Indexed: 12/18/2022] Open
Abstract
Elucidation of cell subpopulations at high resolution is a key and challenging goal of single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) data analysis. Although unsupervised clustering methods have been proposed for de novo identification of cell populations, their performance and robustness suffer from the high variability, low capture efficiency and high dropout rates which are characteristic of scRNA-seq experiments. Here, we present a novel unsupervised method for Single-cell Clustering by Enhancing Network Affinity (SCENA), which mainly employed three strategies: selecting multiple gene sets, enhancing local affinity among cells and clustering of consensus matrices. Large-scale validations on 13 real scRNA-seq datasets show that SCENA has high accuracy in detecting cell populations and is robust against dropout noise. When we applied SCENA to large-scale scRNA-seq data of mouse brain cells, known cell types were successfully detected, and novel cell types of interneurons were identified with differential expression of gamma-aminobutyric acid receptor subunits and transporters. SCENA is equipped with CPU + GPU (Central Processing Units + Graphics Processing Units) heterogeneous parallel computing to achieve high running speed. The high performance and running speed of SCENA combine into a new and efficient platform for biological discoveries in clustering analysis of large and diverse scRNA-seq datasets.
Collapse
Affiliation(s)
- Yaxuan Cui
- College of Computer and Information Engineering, Tianjin Normal University, China
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, China
| | - Ying Liang
- College of Computer and Information Engineering, Tianjin Normal University, China
| | - Xiangyun Wang
- College of Computer and Information Engineering, Tianjin Normal University, China
| | - Thomas N Ferraro
- Department of Biomedical Sciences at CMSRU, Rowan University, NJ 08028, USA
| | - Yong Chen
- Department of Molecular and Cellular Biosciences at Rowan University, Rowan University, NJ 08028, USA
| |
Collapse
|
58
|
Zhao X, Li H, Lyu S, Zhai J, Ji Z, Zhang Z, Zhang X, Liu Z, Wang H, Xu J, Fan H, Kou J, Li L, Lang R, He Q. Single-cell transcriptomics reveals heterogeneous progression and EGFR activation in pancreatic adenosquamous carcinoma. Int J Biol Sci 2021; 17:2590-2605. [PMID: 34326696 PMCID: PMC8315026 DOI: 10.7150/ijbs.58886] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/08/2021] [Indexed: 02/07/2023] Open
Abstract
Pancreatic adenosquamous carcinoma (PASC) - a rare pathological pancreatic cancer (PC) type - has a poor prognosis due to high malignancy. To examine the heterogeneity of PASC, we performed single-cell RNA sequencing (scRNA-seq) profiling with sample tissues from a healthy donor pancreas, an intraductal papillary mucinous neoplasm, and a patient with PASC. Of 9,887 individual cells, ten cell subpopulations were identified, including myeloid, immune, ductal, fibroblast, acinar, stellate, endothelial, and cancer cells. Cancer cells were divided into five clusters. Notably, cluster 1 exhibited stem-like phenotypes expressing UBE2C, ASPM, and TOP2A. We found that S100A2 is a potential biomarker for cancer cells. LGALS1, NPM1, RACK1, and PERP were upregulated from ductal to cancer cells. Furthermore, the copy number variations in ductal and cancer cells were greater than in the reference cells. The expression of EREG, FCGR2A, CCL4L2, and CTSC increased in myeloid cells from the normal pancreas to PASC. The gene sets expressed by cancer-associated fibroblasts were enriched in the immunosuppressive pathways. We demonstrate that EGFR-associated ligand-receptor pairs are activated in ductal-stromal cell communications. Hence, this study revealed the heterogeneous variations of ductal and stromal cells, defined cancer-associated signaling pathways, and deciphered intercellular interactions following PASC progression.
Collapse
Affiliation(s)
- Xin Zhao
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Han Li
- Department of Head and Neck Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Shaocheng Lyu
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Jialei Zhai
- Department of Pathology, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Zhiwei Ji
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhigang Zhang
- School of Information Management and Statistics, Hubei University of Economics, Wuhan 430205, Hubei, China
| | - Xinxue Zhang
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Zhe Liu
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Huaguang Wang
- Department of Pharmacology, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Junming Xu
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Hua Fan
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Jiantao Kou
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Lixin Li
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Ren Lang
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Qiang He
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| |
Collapse
|
59
|
Yang F, Debatosh D, Song T, Zhang JH. Light Harvesting-like Protein 3 Interacts with Phytoene Synthase and Is Necessary for Carotenoid and Chlorophyll Biosynthesis in Rice. RICE (NEW YORK, N.Y.) 2021; 14:32. [PMID: 33745012 PMCID: PMC7981378 DOI: 10.1186/s12284-021-00474-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/10/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND Carotenoid biosynthesis is essential for the generation of photosynthetic pigments, phytohormone production, and flower color development. The light harvesting like 3 (LIL3) protein, which belongs to the light-harvesting complex protein family in photosystems, interacts with geranylgeranyl reductase (GGR) and protochlorophyllide oxidoreductase (POR) both of which are known to regulate terpenoid and chlorophyll biosynthesis, respectively, in both rice and Arabidopsis. RESULTS In our study, a CRISPR-Cas9 generated 4-bp deletion mutant oslil3 showed aberrant chloroplast development, growth defects, low fertility rates and reduced pigment contents. A comparative transcriptomic analysis of oslil3 suggested that differentially expressed genes (DEGs) involved in photosynthesis, cell wall modification, primary and secondary metabolism are differentially regulated in the mutant. Protein-protein interaction assays indicated that LIL3 interacts with phytoene synthase (PSY) and in addition the gene expression of PSY genes are regulated by LIL3. Subcellular localization of LIL3 and PSY suggested that both are thylakoid membrane anchored proteins in the chloroplast. We suggest that LIL3 directly interacts with PSY to regulate carotenoid biosynthesis. CONCLUSION This study reveals a new role of LIL3 in regulating pigment biosynthesis through interaction with the rate limiting enzyme PSY in carotenoid biosynthesis in rice presenting it as a putative target for genetic manipulation of pigment biosynthesis pathways in crop plants.
Collapse
Affiliation(s)
- Feng Yang
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Biology and the Environment, Nanjing Forestry University, Nanjing, 210037, China
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, 518057, Guangdong, China
| | - Das Debatosh
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, 518057, Guangdong, China
| | - Tao Song
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Biology and the Environment, Nanjing Forestry University, Nanjing, 210037, China.
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, 518057, Guangdong, China.
| | - Jian-Hua Zhang
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, 518057, Guangdong, China.
- Department of Biology, Hong Kong Baptist University and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
| |
Collapse
|
60
|
Zhang Y, Ma Y, Huang Y, Zhang Y, Jiang Q, Zhou M, Su J. Benchmarking algorithms for pathway activity transformation of single-cell RNA-seq data. Comput Struct Biotechnol J 2020; 18:2953-2961. [PMID: 33209207 PMCID: PMC7642725 DOI: 10.1016/j.csbj.2020.10.007] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 09/29/2020] [Accepted: 10/02/2020] [Indexed: 12/16/2022] Open
Abstract
Biological pathway analysis provides new insights for cell clustering and functional annotation from single-cell RNA sequencing (scRNA-seq) data. Many pathway analysis algorithms have been developed to transform gene-level scRNA-seq data into functional gene sets representing pathways or biological processes. Here, we collected seven widely-used pathway activity transformation algorithms and 32 available datasets based on 16 scRNA-seq techniques. We proposed a comprehensive framework to evaluate their accuracy, stability and scalability. The assessment of scRNA-seq preprocessing showed that cell filtering had the less impact on scRNA-seq pathway analysis, while data normalization of sctransform and scran had a consistent well impact across all tools. We found that Pagoda2 yielded the best overall performance with the highest accuracy, scalability, and stability. Meanwhile, the tool PLAGE exhibited the highest stability, as well as moderate accuracy and scalability.
Collapse
Affiliation(s)
- Yaru Zhang
- Institute of Biomedical Big Data, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Yunlong Ma
- Institute of Biomedical Big Data, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Yukuan Huang
- Institute of Biomedical Big Data, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Yan Zhang
- Institute of Biomedical Big Data, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Qi Jiang
- Institute of Biomedical Big Data, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Meng Zhou
- Institute of Biomedical Big Data, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Jianzhong Su
- Institute of Biomedical Big Data, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325011, China
| |
Collapse
|
61
|
Germain PL, Sonrel A, Robinson MD. pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools. Genome Biol 2020; 21:227. [PMID: 32873325 PMCID: PMC7465801 DOI: 10.1186/s13059-020-02136-7] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 08/06/2020] [Indexed: 11/13/2022] Open
Abstract
We present pipeComp ( https://github.com/plger/pipeComp ), a flexible R framework for pipeline comparison handling interactions between analysis steps and relying on multi-level evaluation metrics. We apply it to the benchmark of single-cell RNA-sequencing analysis pipelines using simulated and real datasets with known cell identities, covering common methods of filtering, doublet detection, normalization, feature selection, denoising, dimensionality reduction, and clustering. pipeComp can easily integrate any other step, tool, or evaluation metric, allowing extensible benchmarks and easy applications to other fields, as we demonstrate through a study of the impact of removal of unwanted variation on differential expression analysis.
Collapse
Affiliation(s)
- Pierre-Luc Germain
- Department of Molecular Life Sciences, University of Zürich, Winterthurerstrasse 190, Zürich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Zürich, Switzerland
- D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, Zürich, 8057 Switzerland
| | - Anthony Sonrel
- Department of Molecular Life Sciences, University of Zürich, Winterthurerstrasse 190, Zürich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Zürich, Switzerland
| | - Mark D. Robinson
- Department of Molecular Life Sciences, University of Zürich, Winterthurerstrasse 190, Zürich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Zürich, Switzerland
| |
Collapse
|
62
|
Hsu LL, Culhane AC. Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data. Front Oncol 2020; 10:973. [PMID: 32656082 PMCID: PMC7324639 DOI: 10.3389/fonc.2020.00973] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 05/18/2020] [Indexed: 01/04/2023] Open
Abstract
Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered.
Collapse
Affiliation(s)
- Lauren L Hsu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.,Division of Biostatistics and Computational Biology, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, United States
| | - Aedin C Culhane
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.,Division of Biostatistics and Computational Biology, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, United States
| |
Collapse
|
63
|
Heiser CN, Lau KS. A Quantitative Framework for Evaluating Single-Cell Data Structure Preservation by Dimensionality Reduction Techniques. Cell Rep 2020; 31:107576. [PMID: 32375029 PMCID: PMC7305633 DOI: 10.1016/j.celrep.2020.107576] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2019] [Revised: 12/12/2019] [Accepted: 04/06/2020] [Indexed: 01/07/2023] Open
Abstract
High-dimensional data, such as those generated by single-cell RNA sequencing (scRNA-seq), present challenges in interpretation and visualization. Numerical and computational methods for dimensionality reduction allow for low-dimensional representation of genome-scale expression data for downstream clustering, trajectory reconstruction, and biological interpretation. However, a comprehensive and quantitative evaluation of the performance of these techniques has not been established. We present an unbiased framework that defines metrics of global and local structure preservation in dimensionality reduction transformations. Using discrete and continuous real-world and synthetic scRNA-seq datasets, we show how input cell distribution and method parameters are largely determinant of global, local, and organizational data structure preservation by 11 common dimensionality reduction methods.
Collapse
Affiliation(s)
- Cody N Heiser
- Epithelial Biology Center, Vanderbilt University Medical Center, 2213 Garland Avenue, 10475 MRB IV, Nashville, TN 37232, USA; Program in Chemical and Physical Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Ken S Lau
- Epithelial Biology Center, Vanderbilt University Medical Center, 2213 Garland Avenue, 10475 MRB IV, Nashville, TN 37232, USA; Program in Chemical and Physical Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA; Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA; Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN 37232, USA.
| |
Collapse
|