1
|
Razi A, Lo CC, Wang S, Leek JT, Hansen KD. Genotype prediction of 336,463 samples from public expression data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.21.562237. [PMID: 38559266 PMCID: PMC10979922 DOI: 10.1101/2023.10.21.562237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Tens of thousands of RNA-sequencing experiments comprising hundreds of thousands of individual samples have now been performed. These data represent a broad range of experimental conditions, sequencing technologies, and hypotheses under study. The Recount project has aggregated and uniformly processed hundreds of thousands of publicly available RNA-seq samples. Most of these samples only include RNA expression measurements; genotype data for these same samples would enable a wide range of analyses including variant prioritization, eQTL analysis, and studies of allele specific expression. Here, we developed a statistical model based on the existing reference and alternative read counts from the RNA-seq experiments available through Recount3 to predict genotypes at autosomal biallelic loci in coding regions. We demonstrate the accuracy of our model using large-scale studies that measured both gene expression and genotype genome-wide. We show that our predictive model is highly accurate with 99.5% overall accuracy, 99.6% major allele accuracy, and 90.4% minor allele accuracy. Our model is robust to tissue and study effects, provided the coverage is high enough. We applied this model to genotype all the samples in Recount 3 and provide the largest ready-to-use expression repository containing genotype information. We illustrate that the predicted genotype from RNA-seq data is sufficient to unravel the underlying population structure of samples in Recount3 using Principal Component Analysis.
Collapse
Affiliation(s)
- Afrooz Razi
- Department of Genetic Medicine, Johns Hopkins University School of Medicine
| | - Christopher C. Lo
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
| | - Siruo Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
| | - Jeffrey T. Leek
- Biostatistics Program, Division of Public Health Sciences, Fred Hutchinson Cancer Center
| | - Kasper D. Hansen
- Department of Genetic Medicine, Johns Hopkins University School of Medicine
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine
| |
Collapse
|
2
|
Iqbal MA, Hadlich F, Reyer H, Oster M, Trakooljul N, Murani E, Perdomo‐Sabogal A, Wimmers K, Ponsuksili S. RNA-Seq-based discovery of genetic variants and allele-specific expression of two layer lines and broiler chicken. Evol Appl 2023; 16:1135-1153. [PMID: 37360029 PMCID: PMC10286233 DOI: 10.1111/eva.13557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 04/21/2023] [Accepted: 04/22/2023] [Indexed: 06/28/2023] Open
Abstract
Recent advances in the selective breeding of broilers and layers have made poultry production one of the fastest-growing industries. In this study, a transcriptome variant calling approach from RNA-seq data was used to determine population diversity between broilers and layers. In total, 200 individuals were analyzed from three different chicken populations (Lohmann Brown (LB), n = 90), Lohmann Selected Leghorn (LSL, n = 89), and Broiler (BR, n = 21). The raw RNA-sequencing reads were pre-processed, quality control checked, mapped to the reference genome, and made compatible with Genome Analysis ToolKit for variant detection. Subsequently, pairwise fixation index (F ST) analysis was performed between broilers and layers. Numerous candidate genes were identified, that were associated with growth, development, metabolism, immunity, and other economically significant traits. Finally, allele-specific expression (ASE) analysis was performed in the gut mucosa of LB and LSL strains at 10, 16, 24, 30, and 60 weeks of age. At different ages, the two-layer strains showed significantly different allele-specific expressions in the gut mucosa, and changes in allelic imbalance were observed across the entire lifespan. Most ASE genes are involved in energy metabolism, including sirtuin signaling pathways, oxidative phosphorylation, and mitochondrial dysfunction. A high number of ASE genes were found during the peak of laying, which were particularly enriched in cholesterol biosynthesis. These findings indicate that genetic architecture as well as biological processes driving particular demands relate to metabolic and nutritional requirements during the laying period shape allelic heterogeneity. These processes are considerably affected by breeding and management, whereby elucidating allele-specific gene regulation is an essential step towards deciphering the genotype to phenotype map or functional diversity between the chicken populations. Additionally, we observed that several genes showing significant allelic imbalance also colocalized with the top 1% of genes identified by the FST approach, suggesting a fixation of genes in cis-regulatory elements.
Collapse
Affiliation(s)
| | - Frieder Hadlich
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Henry Reyer
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Michael Oster
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Nares Trakooljul
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Eduard Murani
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | | | - Klaus Wimmers
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
- Faculty of Agricultural and Environmental SciencesUniversity RostockRostockGermany
| | - Siriluck Ponsuksili
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| |
Collapse
|
3
|
Jehl F, Degalez F, Bernard M, Lecerf F, Lagoutte L, Désert C, Coulée M, Bouchez O, Leroux S, Abasht B, Tixier-Boichard M, Bed'hom B, Burlot T, Gourichon D, Bardou P, Acloque H, Foissac S, Djebali S, Giuffra E, Zerjal T, Pitel F, Klopp C, Lagarrigue S. RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species. Front Genet 2021; 12:655707. [PMID: 34262593 PMCID: PMC8273700 DOI: 10.3389/fgene.2021.655707] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/01/2021] [Indexed: 12/19/2022] Open
Abstract
In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.
Collapse
Affiliation(s)
- Frédéric Jehl
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Fabien Degalez
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Maria Bernard
- INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France.,INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | | | | | - Colette Désert
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Manon Coulée
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Olivier Bouchez
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | - Sophie Leroux
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Behnam Abasht
- Department of Animal and Food Sciences, University of Delaware, Newark, DE, United States
| | | | - Bertrand Bed'hom
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | | | | | - Philippe Bardou
- INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France
| | - Hervé Acloque
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Sylvain Foissac
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Sarah Djebali
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Elisabetta Giuffra
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Tatiana Zerjal
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Frédérique Pitel
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | | | | |
Collapse
|
4
|
Variant Calling in Next Generation Sequencing Data. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
5
|
Quaglieri A, Flensburg C, Speed TP, Majewski IJ. Finding a suitable library size to call variants in RNA-Seq. BMC Bioinformatics 2020; 21:553. [PMID: 33261552 PMCID: PMC7708150 DOI: 10.1186/s12859-020-03860-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 11/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA sequencing allows the study of both gene expression changes and transcribed mutations, providing a highly effective way to gain insight into cancer biology. When planning the sequencing of a large cohort of samples, library size is a fundamental factor affecting both the overall cost and the quality of the results. Here we specifically address how overall library size influences the detection of somatic mutations in RNA-seq data in two acute myeloid leukaemia datasets. RESULTS : We simulated shallower sequencing depths by downsampling 45 acute myeloid leukaemia samples (100 bp PE) that are part of the Leucegene project, which were originally sequenced at high depth. We compared the sensitivity of six methods of recovering validated mutations on the same samples. The methods compared are a combination of three popular callers (MuTect, VarScan, and VarDict) and two filtering strategies. We observed an incremental loss in sensitivity when simulating libraries of 80M, 50M, 40M, 30M and 20M fragments, with the largest loss detected with less than 30M fragments (below 90%, average loss of 7%). The sensitivity in recovering insertions and deletions varied markedly between callers, with VarDict showing the highest sensitivity (60%). Single nucleotide variant sensitivity is relatively consistent across methods, apart from MuTect, whose default filters need adjustment when using RNA-Seq. We also analysed 136 RNA-Seq samples from the TCGA-LAML cohort (50 bp PE) and assessed the change in sensitivity between the initial libraries (average 59M fragments) and after downsampling to 40M fragments. When considering single nucleotide variants in recurrently mutated myeloid genes we found a comparable performance, with a 6% average loss in sensitivity using 40M fragments. CONCLUSIONS Between 30M and 40M 100 bp PE reads are needed to recover 90-95% of the initial variants on recurrently mutated myeloid genes. To extend this result to another cancer type, an exploration of the characteristics of its mutations and gene expression patterns is suggested.
Collapse
Affiliation(s)
- Anna Quaglieri
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia. .,Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Grattan St, Melbourne, 3010, Australia.
| | - Christoffer Flensburg
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia
| | - Terence P Speed
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia.,Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Grattan St, Melbourne, 3010, Australia.,Department of Mathematics and Statistics, The University of Melbourne, 813 Swanston Street, Melbourne, 3010, Australia
| | - Ian J Majewski
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia. .,Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Grattan St, Melbourne, 3010, Australia.
| |
Collapse
|
6
|
Hagiwara K, Ding L, Edmonson MN, Rice SV, Newman S, Easton J, Dai J, Meshinchi S, Ries RE, Rusch M, Zhang J. RNAIndel: discovering somatic coding indels from tumor RNA-Seq data. Bioinformatics 2020; 36:1382-1390. [PMID: 31593214 DOI: 10.1093/bioinformatics/btz753] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Revised: 08/29/2019] [Accepted: 10/01/2019] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Reliable identification of expressed somatic insertions/deletions (indels) is an unmet need due to artifacts generated in PCR-based RNA-Seq library preparation and the lack of normal RNA-Seq data, presenting analytical challenges for discovery of somatic indels in tumor transcriptome. RESULTS We present RNAIndel, a tool for predicting somatic, germline and artifact indels from tumor RNA-Seq data. RNAIndel leverages features derived from indel sequence context and biological effect in a machine-learning framework. Except for tumor samples with microsatellite instability, RNAIndel robustly predicts 88-100% of somatic indels in five diverse test datasets of pediatric and adult cancers, even recovering subclonal (VAF range 0.01-0.15) driver indels missed by targeted deep-sequencing, outperforming the current best-practice for RNA-Seq variant calling which had 57% sensitivity but with 14 times more false positives. AVAILABILITY AND IMPLEMENTATION RNAIndel is freely available at https://github.com/stjude/RNAIndel. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kohei Hagiwara
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Liang Ding
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Michael N Edmonson
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Stephen V Rice
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Scott Newman
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - John Easton
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Juncheng Dai
- Department of Epidemiology, Nanjing Medical University School of Public Health, Jiangning District, Nanjing, 211166, People's Republic of China
| | - Soheil Meshinchi
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Rhonda E Ries
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Michael Rusch
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Jinghui Zhang
- Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
7
|
Vu TN, Nguyen HN, Calza S, Kalari KR, Wang L, Pawitan Y. Cell-level somatic mutation detection from single-cell RNA sequencing. Bioinformatics 2020; 35:4679-4687. [PMID: 31028395 PMCID: PMC6853710 DOI: 10.1093/bioinformatics/btz288] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 03/19/2019] [Accepted: 04/17/2019] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data. RESULTS Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method-called SCmut-to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity. AVAILABILITY AND IMPLEMENTATION The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden
| | - Ha-Nam Nguyen
- Information Technology Institute, Vietnam National University in Hanoi, Hanoi 84024, Vietnam
| | - Stefano Calza
- Department of Molecular and Translational Medicine, University of Brescia, Brescia 25125, Italy
| | - Krishna R Kalari
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Liewei Wang
- Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden
| |
Collapse
|
8
|
Cheung PPH, Jiang B, Booth GT, Chong TH, Unarta IC, Wang Y, Suarez GD, Wang J, Lis JT, Huang X. Identifying Transcription Error-Enriched Genomic Loci Using Nuclear Run-on Circular-Sequencing Coupled with Background Error Modeling. J Mol Biol 2020; 432:3933-3949. [PMID: 32325070 DOI: 10.1016/j.jmb.2020.04.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 04/08/2020] [Accepted: 04/08/2020] [Indexed: 01/30/2023]
Abstract
RNA polymerase transcribes certain genomic loci with higher errors rates. These transcription error-enriched genomic loci (TEELs) have implications in disease. Current deep-sequencing methods cannot distinguish TEELs from post-transcriptional modifications, stochastic transcription errors, and technical noise, impeding efforts to elucidate the mechanisms linking TEELs to disease. Here, we describe background error model-coupled precision nuclear run-on circular-sequencing (EmPC-seq) to discern genomic regions enriched for transcription misincorporations. EmPC-seq innovatively combines a nuclear run-on assay for capturing nascent RNA before post-transcriptional modifications, a circular-sequencing step that sequences the same nascent RNA molecules multiple times to improve accuracy, and a statistical model for distinguishing error-enriched regions among stochastic polymerase errors. Applying EmPC-seq to the ribosomal RNA transcriptome, we show that TEELs of RNA polymerase I are not randomly distributed but clustered together, with higher error frequencies at nascent transcript 3' ends. Our study establishes a reliable method of identifying TEELs with nucleotide precision, which can help elucidate their molecular origins.
Collapse
Affiliation(s)
- Peter Pak-Hang Cheung
- The Hong Kong University of Science and Technology-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China; Department of Chemistry, Centre of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Biaobin Jiang
- Division of Life Science, Department of Chemical and Biological Engineering, Centre of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong; The HKUST Jockey Club Institute for Advanced Study (IAS), The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Gregory T Booth
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Tin Hang Chong
- The Hong Kong University of Science and Technology-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China
| | - Ilona Christy Unarta
- Bioengineering Graduate Program, Department of Biological and Chemical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Yuqing Wang
- Bioengineering Graduate Program, Department of Biological and Chemical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Gianmarco D Suarez
- Bioengineering Graduate Program, Department of Biological and Chemical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Jiguang Wang
- Division of Life Science, Department of Chemical and Biological Engineering, Centre of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.
| | - John T Lis
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA; The HKUST Jockey Club Institute for Advanced Study (IAS), The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.
| | - Xuhui Huang
- The Hong Kong University of Science and Technology-Shenzhen Research Institute, Hi-Tech Park, Nanshan, Shenzhen 518057, China; Department of Chemistry, Centre of Systems Biology and Human Health, State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong; Bioengineering Graduate Program, Department of Biological and Chemical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.
| |
Collapse
|
9
|
Yizhak K, Aguet F, Kim J, Hess JM, Kübler K, Grimsby J, Frazer R, Zhang H, Haradhvala NJ, Rosebrock D, Livitz D, Li X, Arich-Landkof E, Shoresh N, Stewart C, Segrè AV, Branton PA, Polak P, Ardlie KG, Getz G. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 2019; 364:364/6444/eaaw0726. [PMID: 31171663 DOI: 10.1126/science.aaw0726] [Citation(s) in RCA: 312] [Impact Index Per Article: 62.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 05/02/2019] [Indexed: 02/06/2023]
Abstract
How somatic mutations accumulate in normal cells is poorly understood. A comprehensive analysis of RNA sequencing data from ~6700 samples across 29 normal tissues revealed multiple somatic variants, demonstrating that macroscopic clones can be found in many normal tissues. We found that sun-exposed skin, esophagus, and lung have a higher mutation burden than other tested tissues, which suggests that environmental factors can promote somatic mosaicism. Mutation burden was associated with both age and tissue-specific cell proliferation rate, highlighting that mutations accumulate over both time and number of cell divisions. Finally, normal tissues were found to harbor mutations in known cancer genes and hotspots. This study provides a broad view of macroscopic clonal expansion in human tissues, thus serving as a foundation for associating clonal expansion with environmental factors, aging, and risk of disease.
Collapse
Affiliation(s)
- Keren Yizhak
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jaegil Kim
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Julian M Hess
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kirsten Kübler
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Jonna Grimsby
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Hailei Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicholas J Haradhvala
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | | | | | - Xiao Li
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eila Arich-Landkof
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | - Noam Shoresh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chip Stewart
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ayellet V Segrè
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Harvard Medical School, Boston, MA, USA.,Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
| | - Philip A Branton
- Biorepositories and Biospecimen Research Branch, Cancer Diagnosis Program, National Cancer Institute, Bethesda, MD, USA
| | - Paz Polak
- Oncological Sciences, Icahn School of Medicine at Mount Sinai Hospital, New York, NY, USA
| | | | - Gad Getz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA.,Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
10
|
Adetunji MO, Lamont SJ, Abasht B, Schmidt CJ. Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data. PLoS One 2019; 14:e0216838. [PMID: 31545812 PMCID: PMC6756534 DOI: 10.1371/journal.pone.0216838] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 09/10/2019] [Indexed: 12/27/2022] Open
Abstract
The wealth of information deliverable from transcriptome sequencing (RNA-seq) is significant, however current applications for variant detection still remain a challenge due to the complexity of the transcriptome. Given the ability of RNA-seq to reveal active regions of the genome, detection of RNA-seq SNPs can prove valuable in understanding the phenotypic diversity between populations. Thus, we present a novel computational workflow named VAP (Variant Analysis Pipeline) that takes advantage of multiple RNA-seq splice aware aligners to call SNPs in non-human models using RNA-seq data only. We applied VAP to RNA-seq from a highly inbred chicken line and achieved high accuracy when compared with the matching whole genome sequencing (WGS) data. Over 65% of WGS coding variants were identified from RNA-seq. Further, our results discovered SNPs resulting from post transcriptional modifications, such as RNA editing, which may reveal potentially functional variation that would have otherwise been missed in genomic data. Even with the limitation in detecting variants in expressed regions only, our method proves to be a reliable alternative for SNP identification using RNA-seq data. The source code and user manuals are available at https://modupeore.github.io/VAP/.
Collapse
Affiliation(s)
- Modupeore O. Adetunji
- Department of Animal and Food Sciences, University of Delaware, Newark, Delaware, United States of America
- * E-mail:
| | - Susan J. Lamont
- Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
| | - Behnam Abasht
- Department of Animal and Food Sciences, University of Delaware, Newark, Delaware, United States of America
| | - Carl J. Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, Delaware, United States of America
| |
Collapse
|
11
|
Shameer K, Naika MB, Shafi KM, Sowdhamini R. Decoding systems biology of plant stress for sustainable agriculture development and optimized food production. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 145:19-39. [DOI: 10.1016/j.pbiomolbio.2018.12.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 10/23/2018] [Accepted: 12/06/2018] [Indexed: 12/13/2022]
|
12
|
Xiang Y, Ye Y, Zhang Z, Han L. Maximizing the Utility of Cancer Transcriptomic Data. Trends Cancer 2018; 4:823-837. [PMID: 30470304 DOI: 10.1016/j.trecan.2018.09.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 09/23/2018] [Accepted: 09/24/2018] [Indexed: 12/13/2022]
Abstract
Transcriptomic profiling has been applied to large numbers of cancer samples, by large-scale consortia, including The Cancer Genome Atlas, International Cancer Genome Consortium, and Cancer Cell Line Encyclopedia. Advances in mining cancer transcriptomic data enable us to understand the endless complexity of the cancer transcriptome and thereby to discover new biomarkers and therapeutic targets. In this paper, we review computational resources for deep mining of transcriptomic data to identify, quantify, and determine the functional effects and clinical utility of transcriptomic events, including noncoding RNAs, post-transcriptional regulation, exogenous RNAs, and transcribed genetic variants. These approaches can be applied to other complex diseases, thereby greatly leveraging the impact of this work.
Collapse
Affiliation(s)
- Yu Xiang
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; These authors contributed equally
| | - Youqiong Ye
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; These authors contributed equally
| | - Zhao Zhang
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Leng Han
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Center for Precision Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
| |
Collapse
|
13
|
Wolff A, Bayerlová M, Gaedcke J, Kube D, Beißbarth T. A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells. PLoS One 2018; 13:e0197162. [PMID: 29768462 PMCID: PMC5955523 DOI: 10.1371/journal.pone.0197162] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 04/27/2018] [Indexed: 12/17/2022] Open
Abstract
Background Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. Methods Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. Results The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat’s overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21–0.29 and 0.34–0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. Conclusion In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.
Collapse
Affiliation(s)
- Alexander Wolff
- Dept. of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Michaela Bayerlová
- Dept. of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Jochen Gaedcke
- Dept. of General-, Visceral- and Pediatric Surgery, University Medical Center Göttingen, Göttingen, Germany
| | - Dieter Kube
- Dept. of Hematology and Oncology, University Medical Center Göttingen, Göttingen, Germany
| | - Tim Beißbarth
- Dept. of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
- * E-mail:
| |
Collapse
|
14
|
HLA class I loss in metachronous metastases prevents continuous T cell recognition of mutated neoantigens in a human melanoma model. Oncotarget 2018; 8:28312-28327. [PMID: 28423700 PMCID: PMC5438652 DOI: 10.18632/oncotarget.16048] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 02/27/2017] [Indexed: 12/19/2022] Open
Abstract
T lymphocytes against tumor-specific mutated neoantigens can induce tumor regression. Also, the size of the immunogenic cancer mutanome is supposed to correlate with the clinical efficacy of checkpoint inhibition. Herein, we studied the susceptibility of tumor cell lines from lymph node metastases occurring in a melanoma patient over several years towards blood-derived, neoantigen-specific CD8+ T cells. In contrast to a cell line established during early stage III disease, all cell lines generated at later time points from stage IV metastases exhibited partial or complete loss of HLA class I expression. Whole exome and transcriptome sequencing of the four tumor lines and a germline control were applied to identify expressed somatic single nucleotide substitutions (SNS), insertions and deletions (indels). Candidate peptides encoded by these variants and predicted to bind to the patient's HLA class I alleles were synthesized and tested for recognition by autologous mixed lymphocyte-tumor cell cultures (MLTCs). Peptides from four mutated proteins, HERPUD1G161S, INSIG1S238F, MMS22LS437F and PRDM10S1050F, were recognized by MLTC responders and MLTC-derived T cell clones restricted by HLA-A*24:02 or HLA-B*15:01. Intracellular peptide processing was verified with transfectants. All four neoantigens could only be targeted on the cell line generated during early stage III disease. HLA loss variants of any kind were uniformly resistant. These findings corroborate that, although neoantigens represent attractive therapeutic targets, they also contribute to the process of cancer immunoediting as a serious limitation to specific T cell immunotherapy.
Collapse
|
15
|
Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput Struct Biotechnol J 2018; 16:15-24. [PMID: 29552334 PMCID: PMC5852328 DOI: 10.1016/j.csbj.2018.01.003] [Citation(s) in RCA: 149] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 01/20/2018] [Accepted: 01/28/2018] [Indexed: 02/06/2023] Open
Abstract
Detection of somatic mutations holds great potential in cancer treatment and has been a very active research field in the past few years, especially since the breakthrough of the next-generation sequencing technology. A collection of variant calling pipelines have been developed with different underlying models, filters, input data requirements, and targeted applications. This review aims to enumerate these unique features of the state-of-the-art variant callers, in the hope to provide a practical guide for selecting the appropriate pipeline for specific applications. We will focus on the detection of somatic single nucleotide variants, ranging from traditional variant callers based on whole genome or exome sequencing of paired tumor-normal samples to recent low-frequency variant callers designed for targeted sequencing protocols with unique molecular identifiers. The variant callers have been extensively benchmarked with inconsistent performances across these studies. We will review the reference materials, datasets, and performance metrics that have been used in the benchmarking studies. In the end, we will discuss emerging trends and future directions of the variant calling algorithms.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland 21703, USA
| |
Collapse
|
16
|
Kotelnikova EA, Pyatnitskiy M, Paleeva A, Kremenetskaya O, Vinogradov D. Practical aspects of NGS-based pathways analysis for personalized cancer science and medicine. Oncotarget 2018; 7:52493-52516. [PMID: 27191992 PMCID: PMC5239569 DOI: 10.18632/oncotarget.9370] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 04/18/2016] [Indexed: 12/17/2022] Open
Abstract
Nowadays, the personalized approach to health care and cancer care in particular is becoming more and more popular and is taking an important place in the translational medicine paradigm. In some cases, detection of the patient-specific individual mutations that point to a targeted therapy has already become a routine practice for clinical oncologists. Wider panels of genetic markers are also on the market which cover a greater number of possible oncogenes including those with lower reliability of resulting medical conclusions. In light of the large availability of high-throughput technologies, it is very tempting to use complete patient-specific New Generation Sequencing (NGS) or other "omics" data for cancer treatment guidance. However, there are still no gold standard methods and protocols to evaluate them. Here we will discuss the clinical utility of each of the data types and describe a systems biology approach adapted for single patient measurements. We will try to summarize the current state of the field focusing on the clinically relevant case-studies and practical aspects of data processing.
Collapse
Affiliation(s)
- Ekaterina A Kotelnikova
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Institute Biomedical Research August Pi Sunyer (IDIBAPS), Hospital Clinic of Barcelona, Barcelona, Spain
| | - Mikhail Pyatnitskiy
- Personal Biomedicine, Moscow, Russia.,Orekhovich Institute of Biomedical Chemistry, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia
| | | | - Olga Kremenetskaya
- Personal Biomedicine, Moscow, Russia.,Center for Theoretical Problems of Physicochemical Pharmacology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy Vinogradov
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
17
|
Ching T, Garmire LX. Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:512-523. [PMID: 29218910 PMCID: PMC6068290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Long intergenic non-coding RNAs have been shown to play important roles in cancer. However, because lincRNAs are a relatively new class of RNAs compared to protein-coding mRNAs, the mutational landscape of lincRNAs has not been as extensively studied. Here we characterize expressed somatic nucleotide variants within lincRNAs using 12 cancer RNA-Seq datasets in TCGA. We build machine-learning models to discriminate somatic variants from germline variants within lincRNA regions (AUC 0.987). We build another model to differentiate lincRNA somatic mutations from background regions (AUC 0.72) and find several molecular features that are strongly associated with lincRNA mutations, including copy number variation, conservation, substitution type and histone marker features.
Collapse
Affiliation(s)
- Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI 96822, USA, ²Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI 96813, USA
| | | |
Collapse
|
18
|
Cieślik M, Chinnaiyan AM. Cancer transcriptome profiling at the juncture of clinical translation. Nat Rev Genet 2017; 19:93-109. [PMID: 29279605 DOI: 10.1038/nrg.2017.96] [Citation(s) in RCA: 158] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Methodological breakthroughs over the past four decades have repeatedly revolutionized transcriptome profiling. Using RNA sequencing (RNA-seq), it has now become possible to sequence and quantify the transcriptional outputs of individual cells or thousands of samples. These transcriptomes provide a link between cellular phenotypes and their molecular underpinnings, such as mutations. In the context of cancer, this link represents an opportunity to dissect the complexity and heterogeneity of tumours and to discover new biomarkers or therapeutic strategies. Here, we review the rationale, methodology and translational impact of transcriptome profiling in cancer.
Collapse
Affiliation(s)
- Marcin Cieślik
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan
| | - Arul M Chinnaiyan
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan.,Comprehensive Cancer Center, University of Michigan.,Department of Urology, University of Michigan.,Howard Hughes Medical Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
19
|
Zhao D, Lin M, Pedrosa E, Lachman HM, Zheng D. Characteristics of allelic gene expression in human brain cells from single-cell RNA-seq data analysis. BMC Genomics 2017; 18:860. [PMID: 29126398 PMCID: PMC5681780 DOI: 10.1186/s12864-017-4261-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 11/01/2017] [Indexed: 12/24/2022] Open
Abstract
Background Monoallelic expression of autosomal genes has been implicated in human psychiatric disorders. However, there is a paucity of allelic expression studies in human brain cells at the single cell and genome wide levels. Results In this report, we reanalyzed a previously published single-cell RNA-seq dataset from several postmortem human brains and observed pervasive monoallelic expression in individual cells, largely in a random manner. Examining single nucleotide variants with a predicted functional disruption, we found that the “damaged” alleles were overall expressed in fewer brain cells than their counterparts, and at a lower level in cells where their expression was detected. We also identified many brain cell type-specific monoallelically expressed genes. Interestingly, many of these cell type-specific monoallelically expressed genes were enriched for functions important for those brain cell types. In addition, function analysis showed that genes displaying monoallelic expression and correlated expression across neuronal cells from different individual brains were implicated in the regulation of synaptic function. Conclusions Our findings suggest that monoallelic gene expression is prevalent in human brain cells, which may play a role in generating cellular identity and neuronal diversity and thus increasing the complexity and diversity of brain cell functions. Electronic supplementary material The online version of this article (10.1186/s12864-017-4261-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dejian Zhao
- Department of Neurology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA.,Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA
| | - Mingyan Lin
- Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA.,Present address: Department of Neuroscience, School of Basic Medical Science, Nanjing Medical University, Nanjing, Jiangsu, 21166, China
| | - Erika Pedrosa
- Department of Psychiatry and Behavioral Sciences, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA
| | - Herbert M Lachman
- Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA.,Department of Psychiatry and Behavioral Sciences, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA.,Department of Neuroscience, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA.,Department of Medicine, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA
| | - Deyou Zheng
- Department of Neurology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA. .,Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA. .,Department of Neuroscience, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY, USA.
| |
Collapse
|
20
|
Johnson KW, Shameer K, Glicksberg BS, Readhead B, Sengupta PP, Björkegren JLM, Kovacic JC, Dudley JT. Enabling Precision Cardiology Through Multiscale Biology and Systems Medicine. ACTA ACUST UNITED AC 2017; 2:311-327. [PMID: 30062151 PMCID: PMC6034501 DOI: 10.1016/j.jacbts.2016.11.010] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Revised: 11/29/2016] [Accepted: 11/30/2016] [Indexed: 12/20/2022]
Abstract
The traditional paradigm of cardiovascular disease research derives insight from large-scale, broadly inclusive clinical studies of well-characterized pathologies. These insights are then put into practice according to standardized clinical guidelines. However, stagnation in the development of new cardiovascular therapies and variability in therapeutic response implies that this paradigm is insufficient for reducing the cardiovascular disease burden. In this state-of-the-art review, we examine 3 interconnected ideas we put forth as key concepts for enabling a transition to precision cardiology: 1) precision characterization of cardiovascular disease with machine learning methods; 2) the application of network models of disease to embrace disease complexity; and 3) using insights from the previous 2 ideas to enable pharmacology and polypharmacology systems for more precise drug-to-patient matching and patient-disease stratification. We conclude by exploring the challenges of applying a precision approach to cardiology, which arise from a deficit of the required resources and infrastructure, and emerging evidence for the clinical effectiveness of this nascent approach.
Collapse
Affiliation(s)
- Kipp W Johnson
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Khader Shameer
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Benjamin S Glicksberg
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Ben Readhead
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Partho P Sengupta
- The Zena and Michael A. Wiener Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York.,Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York.,Department of Medical Biochemistry and Biophysics Vascular Biology Unit, Karolinska Institutet, Stockholm, Sweden
| | - Jason C Kovacic
- The Zena and Michael A. Wiener Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York.,Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York.,Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
21
|
Oikkonen L, Lise S. Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection. Wellcome Open Res 2017; 2:6. [PMID: 28239666 PMCID: PMC5322827 DOI: 10.12688/wellcomeopenres.10501.2] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
RNA-seq (transcriptome sequencing) is primarily considered a method of gene expression analysis but it can also be used to detect DNA variants in expressed regions of the genome. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.
Collapse
Affiliation(s)
- Laura Oikkonen
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Stefano Lise
- Centre for Evolution and Cancer, The Institute of Cancer Research, Sutton, UK
| |
Collapse
|
22
|
Oikkonen L, Lise S. Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection. Wellcome Open Res 2017. [PMID: 28239666 DOI: 10.12688/wellcomeopenres.10501.1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Identifying variants from RNA-seq (transcriptome sequencing) data is a cost-effective and versatile alternative to whole-genome sequencing. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.
Collapse
Affiliation(s)
- Laura Oikkonen
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Stefano Lise
- Centre for Evolution and Cancer, The Institute of Cancer Research, Sutton, UK
| |
Collapse
|
23
|
Martin DP, Miya J, Reeser JW, Roychowdhury S. Targeted RNA Sequencing Assay to Characterize Gene Expression and Genomic Alterations. J Vis Exp 2016:54090. [PMID: 27585245 PMCID: PMC5091715 DOI: 10.3791/54090] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
RNA sequencing (RNAseq) is a versatile method that can be utilized to detect and characterize gene expression, mutations, gene fusions, and noncoding RNAs. Standard RNAseq requires 30 - 100 million sequencing reads and can include multiple RNA products such as mRNA and noncoding RNAs. We demonstrate how targeted RNAseq (capture) permits a focused study on selected RNA products using a desktop sequencer. RNAseq capture can characterize unannotated, low, or transiently expressed transcripts that may otherwise be missed using traditional RNAseq methods. Here we describe the extraction of RNA from cell lines, ribosomal RNA depletion, cDNA synthesis, preparation of barcoded libraries, hybridization and capture of targeted transcripts and multiplex sequencing on a desktop sequencer. We also outline the computational analysis pipeline, which includes quality control assessment, alignment, fusion detection, gene expression quantification and identification of single nucleotide variants. This assay allows for targeted transcript sequencing to characterize gene expression, gene fusions, and mutations.
Collapse
Affiliation(s)
- Dorrelyn P Martin
- Department of Internal Medicine, Division of Medical Oncology, Comprehensive Cancer Center, The Ohio State University
| | - Jharna Miya
- Department of Internal Medicine, Division of Medical Oncology, Comprehensive Cancer Center, The Ohio State University
| | - Julie W Reeser
- Department of Internal Medicine, Division of Medical Oncology, Comprehensive Cancer Center, The Ohio State University
| | - Sameek Roychowdhury
- Department of Internal Medicine, Division of Medical Oncology, Comprehensive Cancer Center, The Ohio State University; Department of Pharmacology, The Ohio State University;
| |
Collapse
|
24
|
Verma M. Genome-wide association studies and epigenome-wide association studies go together in cancer control. Future Oncol 2016; 12:1645-64. [PMID: 27079684 PMCID: PMC5551540 DOI: 10.2217/fon-2015-0035] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Accepted: 03/22/2016] [Indexed: 02/07/2023] Open
Abstract
Completion of the human genome a decade ago laid the foundation for: using genetic information in assessing risk to identify individuals and populations that are likely to develop cancer, and designing treatments based on a person's genetic profiling (precision medicine). Genome-wide association studies (GWAS) completed during the past few years have identified risk-associated single nucleotide polymorphisms that can be used as screening tools in epidemiologic studies of a variety of tumor types. This led to the conduct of epigenome-wide association studies (EWAS). This article discusses the current status, challenges and research opportunities in GWAS and EWAS. Information gained from GWAS and EWAS has potential applications in cancer control and treatment.
Collapse
Affiliation(s)
- Mukesh Verma
- Methods & Technologies Branch, Epidemiology & Genomics Research Program, Division of Cancer Control & Population Sciences, National Cancer Institute (NCI), NIH, 9609 Medical Center Drive, Suite 4E102, Rockville, MD 20850, USA
| |
Collapse
|
25
|
Kalari KR, Thompson KJ, Nair AA, Tang X, Bockol MA, Jhawar N, Swaminathan SK, Lowe VJ, Kandimalla KK. BBBomics-Human Blood Brain Barrier Transcriptomics Hub. Front Neurosci 2016; 10:71. [PMID: 26973449 PMCID: PMC4771746 DOI: 10.3389/fnins.2016.00071] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 02/15/2016] [Indexed: 11/28/2022] Open
Affiliation(s)
- Krishna R Kalari
- Division of Biostatistics and Bioinformatics, Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | - Kevin J Thompson
- Division of Biostatistics and Bioinformatics, Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | - Asha A Nair
- Division of Biostatistics and Bioinformatics, Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | - Xiaojia Tang
- Division of Biostatistics and Bioinformatics, Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | - Matthew A Bockol
- Division of Biostatistics and Bioinformatics, Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | - Navya Jhawar
- Division of Biostatistics and Bioinformatics, Department of Health Sciences Research, Mayo Clinic Rochester, MN, USA
| | - Suresh K Swaminathan
- Department of Pharmaceutics and Brain Barriers Research Center, University of Minnesota Minneapolis, MN, USA
| | - Val J Lowe
- Department of Radiology, Mayo Clinic Rochester, MN, USA
| | - Karunya K Kandimalla
- Department of Pharmaceutics and Brain Barriers Research Center, University of Minnesota Minneapolis, MN, USA
| |
Collapse
|
26
|
Shameer K, Tripathi LP, Kalari KR, Dudley JT, Sowdhamini R. Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment. Brief Bioinform 2015; 17:841-62. [PMID: 26494363 DOI: 10.1093/bib/bbv084] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Indexed: 12/20/2022] Open
Abstract
Accurate assessment of genetic variation in human DNA sequencing studies remains a nontrivial challenge in clinical genomics and genome informatics. Ascribing functional roles and/or clinical significances to single nucleotide variants identified from a next-generation sequencing study is an important step in genome interpretation. Experimental characterization of all the observed functional variants is yet impractical; thus, the prediction of functional and/or regulatory impacts of the various mutations using in silico approaches is an important step toward the identification of functionally significant or clinically actionable variants. The relationships between genotypes and the expressed phenotypes are multilayered and biologically complex; such relationships present numerous challenges and at the same time offer various opportunities for the design of in silico variant assessment strategies. Over the past decade, many bioinformatics algorithms have been developed to predict functional consequences of single nucleotide variants in the protein coding regions. In this review, we provide an overview of the bioinformatics resources for the prediction, annotation and visualization of coding single nucleotide variants. We discuss the currently available approaches and major challenges from the perspective of protein sequence, structure, function and interactions that require consideration when interpreting the impact of putatively functional variants. We also discuss the relevance of incorporating integrated workflows for predicting the biomedical impact of the functionally important variations encoded in a genome, exome or transcriptome. Finally, we propose a framework to classify variant assessment approaches and strategies for incorporation of variant assessment within electronic health records.
Collapse
|
27
|
Saal LH, Vallon-Christersson J, Häkkinen J, Hegardt C, Grabau D, Winter C, Brueffer C, Tang MHE, Reuterswärd C, Schulz R, Karlsson A, Ehinger A, Malina J, Manjer J, Malmberg M, Larsson C, Rydén L, Loman N, Borg Å. The Sweden Cancerome Analysis Network - Breast (SCAN-B) Initiative: a large-scale multicenter infrastructure towards implementation of breast cancer genomic analyses in the clinical routine. Genome Med 2015; 7:20. [PMID: 25722745 PMCID: PMC4341872 DOI: 10.1186/s13073-015-0131-9] [Citation(s) in RCA: 120] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 01/15/2015] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Breast cancer exhibits significant molecular, pathological, and clinical heterogeneity. Current clinicopathological evaluation is imperfect for predicting outcome, which results in overtreatment for many patients, and for others, leads to death from recurrent disease. Therefore, additional criteria are needed to better personalize care and maximize treatment effectiveness and survival. METHODS To address these challenges, the Sweden Cancerome Analysis Network - Breast (SCAN-B) consortium was initiated in 2010 as a multicenter prospective study with longsighted aims to analyze breast cancers with next-generation genomic technologies for translational research in a population-based manner and integrated with healthcare; decipher fundamental tumor biology from these analyses; utilize genomic data to develop and validate new clinically-actionable biomarker assays; and establish real-time clinical implementation of molecular diagnostic, prognostic, and predictive tests. In the first phase, we focus on molecular profiling by next-generation RNA-sequencing on the Illumina platform. RESULTS In the first 3 years from 30 August 2010 through 31 August 2013, we have consented and enrolled 3,979 patients with primary breast cancer at the seven hospital sites in South Sweden, representing approximately 85% of eligible patients in the catchment area. Preoperative blood samples have been collected for 3,942 (99%) patients and primary tumor specimens collected for 2,929 (74%) patients. Herein we describe the study infrastructure and protocols and present initial proof of concept results from prospective RNA sequencing including tumor molecular subtyping and detection of driver gene mutations. Prospective patient enrollment is ongoing. CONCLUSIONS We demonstrate that large-scale population-based collection and RNA-sequencing analysis of breast cancer is feasible. The SCAN-B Initiative should significantly reduce the time to discovery, validation, and clinical implementation of novel molecular diagnostic and predictive tests. We welcome the participation of additional comprehensive cancer treatment centers. TRIAL REGISTRATION ClinicalTrials.gov identifier NCT02306096.
Collapse
Affiliation(s)
- Lao H Saal
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />CREATE Health Strategic Centre for Translational Cancer Research, Lund University, SE-22381 Lund, Sweden
| | - Johan Vallon-Christersson
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />CREATE Health Strategic Centre for Translational Cancer Research, Lund University, SE-22381 Lund, Sweden
| | - Jari Häkkinen
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />CREATE Health Strategic Centre for Translational Cancer Research, Lund University, SE-22381 Lund, Sweden
| | - Cecilia Hegardt
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />CREATE Health Strategic Centre for Translational Cancer Research, Lund University, SE-22381 Lund, Sweden
| | - Dorthe Grabau
- />Department of Pathology, Skåne University Hospital, SE-22185 Lund, Sweden
| | - Christof Winter
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
| | - Christian Brueffer
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
| | - Man-Hung Eric Tang
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
| | - Christel Reuterswärd
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />Department of Clinical Sciences, SCIBLU Genomics, Lund University, SE-22381 Lund, Sweden
| | - Ralph Schulz
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />Department of Clinical Sciences, SCIBLU Genomics, Lund University, SE-22381 Lund, Sweden
| | - Anna Karlsson
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />Department of Clinical Sciences, SCIBLU Genomics, Lund University, SE-22381 Lund, Sweden
| | - Anna Ehinger
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />Department of Pathology and Cytology, Blekinge County Hospital, SE-37185 Karlskrona, Sweden
| | - Janne Malina
- />Department of Pathology, Skåne University Hospital, SE-20502 Malmö, Sweden
| | - Jonas Manjer
- />Department of Surgery, Lund University and Skåne University Hospital, SE-20502 Malmö, Sweden
| | - Martin Malmberg
- />Department of Oncology, Skåne University Hospital, SE-22185 Lund, Sweden
| | - Christer Larsson
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />Department of Laboratory Medicine, Division of Molecular Pathology, Lund University, SE-22185 Lund, Sweden
| | - Lisa Rydén
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />Department of Surgery, Lund University and Skåne University Hospital, SE-22185 Lund, Sweden
| | - Niklas Loman
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />Department of Oncology, Skåne University Hospital, SE-22185 Lund, Sweden
| | - Åke Borg
- />Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Medicon Village 404-A2, SE-22381 Lund, Sweden
- />Lund University Cancer Center, SE-22381 Lund, Sweden
- />CREATE Health Strategic Centre for Translational Cancer Research, Lund University, SE-22381 Lund, Sweden
- />Department of Clinical Sciences, SCIBLU Genomics, Lund University, SE-22381 Lund, Sweden
| |
Collapse
|