1
|
García-Pérez R, Ramirez JM, Ripoll-Cladellas A, Chazarra-Gil R, Oliveros W, Soldatkina O, Bosio M, Rognon PJ, Capella-Gutierrez S, Calvo M, Reverter F, Guigó R, Aguet F, Ferreira PG, Ardlie KG, Melé M. The landscape of expression and alternative splicing variation across human traits. Cell Genom 2022; 3:100244. [PMID: 36777183 PMCID: PMC9903719 DOI: 10.1016/j.xgen.2022.100244] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/08/2022] [Accepted: 12/07/2022] [Indexed: 12/31/2022]
Abstract
Understanding the consequences of individual transcriptome variation is fundamental to deciphering human biology and disease. We implement a statistical framework to quantify the contributions of 21 individual traits as drivers of gene expression and alternative splicing variation across 46 human tissues and 781 individuals from the Genotype-Tissue Expression project. We demonstrate that ancestry, sex, age, and BMI make additive and tissue-specific contributions to expression variability, whereas interactions are rare. Variation in splicing is dominated by ancestry and is under genetic control in most tissues, with ribosomal proteins showing a strong enrichment of tissue-shared splicing events. Our analyses reveal a systemic contribution of types 1 and 2 diabetes to tissue transcriptome variation with the strongest signal in the nerve, where histopathology image analysis identifies novel genes related to diabetic neuropathy. Our multi-tissue and multi-trait approach provides an extensive characterization of the main drivers of human transcriptome variation in health and disease.
Collapse
Affiliation(s)
- Raquel García-Pérez
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Jose Miguel Ramirez
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Aida Ripoll-Cladellas
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Ruben Chazarra-Gil
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Winona Oliveros
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Oleksandra Soldatkina
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Mattia Bosio
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Paul Joris Rognon
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain,Department of Economics and Business, Universitat Pompeu Fabra, Barcelona, Catalonia 08005, Spain,Department of Statistics and Operations Research, Universitat Politècnica de Catalunya, Barcelona, Catalonia 08034, Spain
| | - Salvador Capella-Gutierrez
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain
| | - Miquel Calvo
- Statistics Section, Faculty of Biology, Universitat de Barcelona (UB), Barcelona, Catalonia 08028, Spain
| | - Ferran Reverter
- Statistics Section, Faculty of Biology, Universitat de Barcelona (UB), Barcelona, Catalonia 08028, Spain
| | - Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation, Barcelona, Catalonia 08003, Spain
| | | | - Pedro G. Ferreira
- Department of Computer Science, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal,Laboratory of Artificial Intelligence and Decision Support, INESC TEC, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal,Institute of Molecular Pathology and Immunology of the University of Porto, Institute for Research and Innovation in Health (i3s), R. Alfredo Allen 208, 4200-135 Porto, Portugal
| | | | - Marta Melé
- Department of Life Sciences, Barcelona Supercomputing Center (BCN-CNS), Barcelona, Catalonia 08034, Spain,Corresponding author
| |
Collapse
|
2
|
Williams EC, Chazarra-Gil R, Shahsavari A, Mohorianu I. The Sum of Two Halves May Be Different from the Whole-Effects of Splitting Sequencing Samples Across Lanes. Genes (Basel) 2022; 13:genes13122265. [PMID: 36553532 PMCID: PMC9777937 DOI: 10.3390/genes13122265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main challenge. Although variability in results may be introduced at various stages, e.g., alignment, summarisation or detection of differential expression, one source of variability was systematically omitted: the sequencing design, which propagates through analyses and may introduce an additional layer of technical variation. We illustrate qualitative and quantitative differences arising from splitting samples across lanes on bulk and single-cell sequencing. For bulk mRNAseq data, we focus on differential expression and enrichment analyses; for bulk ChIPseq data, we investigate the effect on peak calling and the peaks' properties. At the single-cell level, we concentrate on identifying cell subpopulations. We rely on markers used for assigning cell identities; both smartSeq and 10× data are presented. The observed reduction in the number of unique sequenced fragments limits the level of detail on which the different prediction approaches depend. Furthermore, the sequencing stochasticity adds in a weighting bias corroborated with variable sequencing depths and (yet unexplained) sequencing bias. Subsequently, we observe an overall reduction in sequencing complexity and a distortion in the biological signal across technologies, experimental contexts, organisms and tissues.
Collapse
Affiliation(s)
- Eleanor C. Williams
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK
| | - Ruben Chazarra-Gil
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK
- Life Sciences-Transcriptomics and Functional Genomics Lab, Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain
| | - Arash Shahsavari
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK
| | - Irina Mohorianu
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK
- Correspondence:
| |
Collapse
|
3
|
Chazarra-Gil R, van Dongen S, Kiselev VY, Hemberg M. Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench. Nucleic Acids Res 2021; 49:e42. [PMID: 33524142 PMCID: PMC8053088 DOI: 10.1093/nar/gkab004] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 12/11/2020] [Accepted: 01/29/2021] [Indexed: 01/02/2023] Open
Abstract
As the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here, we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.
Collapse
Affiliation(s)
| | - Stijn van Dongen
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | | | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|