1
|
Carrara L, Hall D. Noninvasive Prenatal Paternity Testing: A Review on Genetic Markers. Int J Mol Sci 2025; 26:4518. [PMID: 40429663 PMCID: PMC12111050 DOI: 10.3390/ijms26104518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2025] [Revised: 05/06/2025] [Accepted: 05/08/2025] [Indexed: 05/29/2025] Open
Abstract
Noninvasive prenatal paternity testing (NIPPT) is a crucial tool in forensic contexts, particularly in cases involving post-rape pregnancies. It enables judicial authorities and victims to promptly address these situations by determining the paternity of the fetus within a few weeks of pregnancy. NIPPT relies on the analysis of cell-free fetal DNA (cffDNA) found in the maternal bloodstream. However, the abundance of maternal DNA presents a significant challenge in detecting fetal DNA. As a result, research has focused on improving methods for isolating or enriching fetal DNA and, specifically in the context of forensic genetics, on the development of suitable genetic markers. The use of Single Nucleotide Polymorphisms (SNPs) along with novel compound markers or composite multiplexes, has shown promising results. Despite significant advances, partly driven by the increased use of Massive Parallel Sequencing (MPS), challenges remain in validating markers-based NIPPT assays for forensic casework. Further studies are required to enhance the sensitivity of these tests, particularly during the early stages of pregnancy, such as the first trimester. Additionally, improving and standardizing statistical frameworks for result evaluation and interpretation is essential to ensure compatibility with forensic standards.
Collapse
Affiliation(s)
- Laura Carrara
- School of Criminal Justice, Faculty of Law, Criminal Justice and Public Administration, University of Lausanne, Batochime, 1015 Lausanne, Switzerland;
| | - Diana Hall
- Forensic Genetics Unit, University Center of Legal Medicine, Lausanne-Geneva, Lausanne University Hospital and University of Lausanne, 1000 Lausanne, Switzerland
| |
Collapse
|
2
|
Sirica R, Ottaiano A, D’Amore L, Ianniello M, Petrillo N, Ruggiero R, Castiello R, Mori A, Evangelista E, De Falco L, Santorsola M, Misasi M, Savarese G, Fico A. Advancing Non-Invasive Prenatal Screening: A Targeted 1069-Gene Panel for Comprehensive Detection of Monogenic Disorders and Copy Number Variations. Genes (Basel) 2025; 16:427. [PMID: 40282387 PMCID: PMC12026569 DOI: 10.3390/genes16040427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2025] [Revised: 03/21/2025] [Accepted: 03/27/2025] [Indexed: 04/29/2025] Open
Abstract
We introduce an innovative, non-invasive prenatal screening approach for detecting fetal monogenic alterations and copy number variations (CNVs) from maternal blood. METHOD Circulating free DNA (cfDNA) was extracted from maternal peripheral blood and processed using the VeriSeq NIPT Solution (Illumina, San Diego, CA, USA), with shallow whole-genome sequencing (sWGS) performed on a NextSeq550Dx (Illumina). A customized gene panel and bioinformatics tool, named the "VERA Revolution", were developed to detect variants and CNVs in cfDNA samples. Results were compared with genomic DNA (gDNA) extracted from fetal samples, including amniotic fluid and chorionic villus sampling and buccal swabs. RESULTS The study included pregnant women with gestational ages from 10 + 3 to 15 + 2 weeks (mean: 12.1 weeks). The fetal fraction (FF), a crucial measure of cfDNA test reliability, ranged from 5% to 20%, ensuring adequate DNA amount for analysis. Among 36 families tested, 14 showed a wild-type genotype. Identified variants included two deletions (22q11.2, and 4p16.3), two duplications (16p13 and 5p15), and eighteen single-nucleotide variants (one in CFTR, three in GJB2, three in PAH, one in RIT1, one in DHCR7, one in TCOF1, one in ABCA4, one in MYBPC3, one in MCCC2, two in GBA1 and three in PTPN11). Significant concordance was found between our panel results and prenatal/postnatal genetic profiles. CONCLUSIONS The "VERA Revolution" test highlights advancements in prenatal genomic screening, offering potential improvements in prenatal care.
Collapse
Affiliation(s)
- Roberto Sirica
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Alessandro Ottaiano
- Istituto Nazionale Tumori di Napoli, IRCCS Fondazione Pascale, Via M. Semmola, 80131 Naples, Italy; (A.O.); (M.S.)
| | - Luigi D’Amore
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Monica Ianniello
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Nadia Petrillo
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Raffaella Ruggiero
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Rosa Castiello
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Alessio Mori
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Eloisa Evangelista
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Luigia De Falco
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Mariachiara Santorsola
- Istituto Nazionale Tumori di Napoli, IRCCS Fondazione Pascale, Via M. Semmola, 80131 Naples, Italy; (A.O.); (M.S.)
| | - Michele Misasi
- Department of Gynecology and Obstetrics, Universiteti Katolik Zoja e Këshillit të Mirë, Rr. Dritan Hoxha, 1057 Tirane, Albania
| | - Giovanni Savarese
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| | - Antonio Fico
- Centro AMES, 80013 Casalnuovo di Napoli, Italy; (L.D.); (M.I.); (N.P.); (R.R.); (R.C.); (G.S.); (A.F.)
| |
Collapse
|
3
|
Hosseini II, Hamidi SV, Capaldi X, Liu Z, Silva Pessoa MA, Mahshid S, Reisner W. Tunable nanofluidic device for digital nucleic acid analysis. NANOSCALE 2024. [PMID: 38682564 DOI: 10.1039/d3nr05553a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
Nano/microfluidic-based nucleic acid tests have been proposed as a rapid and reliable diagnostic technology. Two key steps for many of these tests are target nucleic acid (NA) immobilization followed by an enzymatic reaction on the captured NAs to detect the presence of a disease-associated sequence. NA capture within a geometrically confined volume is an attractive alternative to NA surface immobilization that eliminates the need for sample pre-treatment (e.g. label-based methods such as lateral flow assays) or use of external actuators (e.g. dielectrophoresis) that are required for most nano/microfluidic-based NA tests. However, geometrically confined spaces hinder sample loading while making it challenging to capture, subsequently, retain and simultaneously expose target NAs to required enzymes. Here, using a nanofluidic device that features real-time confinement control via pneumatic actuation of a thin membrane lid, we demonstrate the loading of digital nanocavities by target NAs and exposure of target NAs to required enzymes/co-factors while the NAs are retained. In particular, as proof of principle, we amplified single-stranded DNAs (M13mp18 plasmid vector) in an array of nanocavities via two isothermal amplification approaches (loop-mediated isothermal amplification and rolling circle amplification).
Collapse
Affiliation(s)
- Imman I Hosseini
- Department of Biomedical Engineering, McGill University, 3775 Rue University, Montreal, Quebec H3A 2B4, Canada.
| | - Seyed Vahid Hamidi
- Department of Biomedical Engineering, McGill University, 3775 Rue University, Montreal, Quebec H3A 2B4, Canada.
| | - Xavier Capaldi
- Department of Physics, McGill University, 3600 Rue University, Montreal, Quebec H3A 2T8, Canada.
| | - Zezhou Liu
- Department of Physics, McGill University, 3600 Rue University, Montreal, Quebec H3A 2T8, Canada.
| | | | - Sara Mahshid
- Department of Biomedical Engineering, McGill University, 3775 Rue University, Montreal, Quebec H3A 2B4, Canada.
| | - Walter Reisner
- Department of Physics, McGill University, 3600 Rue University, Montreal, Quebec H3A 2T8, Canada.
| |
Collapse
|
4
|
Shirai Y, Ueno T, Kojima S, Ikeuchi H, Kitada R, Koyama T, Takahashi F, Takahashi K, Ichimura K, Yoshida A, Sugino H, Mano H, Narita Y, Takahashi M, Kohsaka S. The development of a custom RNA-sequencing panel for the identification of predictive and diagnostic biomarkers in glioma. J Neurooncol 2024; 167:75-88. [PMID: 38363490 PMCID: PMC10978676 DOI: 10.1007/s11060-024-04563-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 01/02/2024] [Indexed: 02/17/2024]
Abstract
PURPOSE Various molecular profiles are needed to classify malignant brain tumors, including gliomas, based on the latest classification criteria of the World Health Organization, and their poor prognosis necessitates new therapeutic targets. The Todai OncoPanel 2 RNA Panel (TOP2-RNA) is a custom-target RNA-sequencing (RNA-seq) using the junction capture method to maximize the sensitivity of detecting 455 fusion gene transcripts and analyze the expression profiles of 1,390 genes. This study aimed to classify gliomas and identify their molecular targets using TOP2-RNA. METHODS A total of 124 frozen samples of malignant gliomas were subjected to TOP2-RNA for classification based on their molecular profiles and the identification of molecular targets. RESULTS Among 55 glioblastoma cases, gene fusions were detected in 11 cases (20%), including novel MET fusions. Seven tyrosine kinase genes were found to be overexpressed in 15 cases (27.3%). In contrast to isocitrate dehydrogenase (IDH) wild-type glioblastoma, IDH-mutant tumors, including astrocytomas and oligodendrogliomas, barely harbor fusion genes or gene overexpression. Of the 34 overexpressed tyrosine kinase genes, MDM2 and CDK4 in glioblastoma, 22 copy number amplifications (64.7%) were observed. When comparing astrocytomas and oligodendrogliomas in gene set enrichment analysis, the gene sets related to 1p36 and 19q were highly enriched in astrocytomas, suggesting that regional genomic DNA copy number alterations can be evaluated by gene expression analysis. CONCLUSIONS TOP2-RNA is a highly sensitive assay for detecting fusion genes, exon skipping, and aberrant gene expression. Alterations in targetable driver genes were identified in more than 50% of glioblastoma. Molecular profiling by TOP2-RNA provides ample predictive, prognostic, and diagnostic biomarkers that may not be identified by conventional assays and, therefore, is expected to increase treatment options for individual patients with glioma.
Collapse
Affiliation(s)
- Yukina Shirai
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
- Department of Respiratory Medicine, Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8431, Japan
| | - Toshihide Ueno
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Shinya Kojima
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Hiroshi Ikeuchi
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
- Department of General Thoracic Surgery, Juntendo University School of Medicine, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8431, Japan
| | - Rina Kitada
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Takafumi Koyama
- Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Fumiyuki Takahashi
- Department of Respiratory Medicine, Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8431, Japan
| | - Kazuhisa Takahashi
- Department of Respiratory Medicine, Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8431, Japan
| | - Koichi Ichimura
- Department of Brain Disease Translational Research, Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-Ku, Tokyo, 113-8431, Japan
| | - Akihiko Yoshida
- Department of Diagnostic Pathology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Hirokazu Sugino
- Department of Diagnostic Pathology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Hiroyuki Mano
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Yoshitaka Narita
- Department of Neurosurgery and Neuro-Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan
| | - Masamichi Takahashi
- Department of Neurosurgery and Neuro-Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan.
| | - Shinji Kohsaka
- Division of Cellular Signaling, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-Ku, Tokyo, 104-0045, Japan.
| |
Collapse
|
5
|
Pidon H, Ruge-Wehling B, Will T, Habekuß A, Wendler N, Oldach K, Maasberg-Prelle A, Korzun V, Stein N. High-resolution mapping of Ryd4 Hb, a major resistance gene to Barley yellow dwarf virus from Hordeum bulbosum. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:60. [PMID: 38409375 PMCID: PMC10896957 DOI: 10.1007/s00122-024-04542-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/05/2024] [Indexed: 02/28/2024]
Abstract
KEY MESSAGE We mapped Ryd4Hb in a 66.5 kbp interval in barley and dissociated it from a sublethality factor. These results will enable a targeted selection of the resistance in barley breeding. Virus diseases are causing high yield losses in crops worldwide. The Barley yellow dwarf virus (BYDV) complex is responsible for one of the most widespread and economically important viral diseases of cereals. While no gene conferring complete resistance (immunity) has been uncovered in the primary gene pool of barley, sources of resistance were searched and identified in the wild relative Hordeum bulbosum, representing the secondary gene pool of barley. One such locus, Ryd4Hb, has been previously introgressed into barley, and was allocated to chromosome 3H, but is tightly linked to a sublethality factor that prevents the incorporation and utilization of Ryd4Hb in barley varieties. To solve this problem, we fine-mapped Ryd4Hb and separated it from this negative factor. We narrowed the Ryd4Hb locus to a corresponding 66.5 kbp physical interval in the barley 'Morex' reference genome. The region comprises a gene from the nucleotide-binding and leucine-rich repeat immune receptor family, typical of dominant virus resistance genes. The closest homolog to this Ryd4Hb candidate gene is the wheat Sr35 stem rust resistance gene. In addition to the fine mapping, we reduced the interval bearing the sublethality factor to 600 kbp in barley. Aphid feeding experiments demonstrated that Ryd4Hb provides a resistance to BYDV rather than to its vector. The presented results, including the high-throughput molecular markers, will permit a more targeted selection of the resistance in breeding, enabling the use of Ryd4Hb in barley varieties.
Collapse
Affiliation(s)
- Hélène Pidon
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- IPSiM, Univ Montpellier, CNRS, INRAE, Institut Agro, Montpellier, France.
| | - Brigitte Ruge-Wehling
- Julius Kühn Institute (JKI)-Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Agricultural Crops, Sanitz, Germany
| | - Torsten Will
- Julius Kühn Institute (JKI)-Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Quedlinburg, Germany
| | - Antje Habekuß
- Julius Kühn Institute (JKI)-Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Quedlinburg, Germany
| | | | | | | | | | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Göttingen, Germany.
| |
Collapse
|
6
|
Sudan J, Sharma S, Salgotra RK, Pandey RK, Neelam D, Singh R. Elucidating the process of SNPs identification in non-reference genome crops. J Biomol Struct Dyn 2023; 41:15682-15690. [PMID: 37021361 DOI: 10.1080/07391102.2023.2194002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 02/28/2023] [Indexed: 04/07/2023]
Abstract
Advances in the next generation sequencing technologies, genome reduction techniques and bioinformatics tools have given a big impetus to the identification of genome-wide single nucleotide polymorphisms (SNPs) in crops. NGS technologies can make available a large amount of sequence data in a short span of time. The huge data requires detailed bioinformatics analysis steps, including preprocessing, mapping, and identification of sequence variants. A plethora of available software meant for sequence analysis is used for different sequence analysis steps. However, SNPs identification is far more challenging for orphaned crops or non-reference genome crops. The current article reports different steps for in silico SNPs identification in a sequential manner and proposes some mapping approaches using CLC Genomics software that could provide an alternative method for SNPs identification in orphan crops having no reference genome. The three mapping approaches: Common reference map from progenitor genomes (CRMPG), step-wise use of progenitor genomes (SWPG) and de novo assembly of sequence read (DASR) were validated with the dd-RAD sequenced data of two genotypes from Brassica juncea.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Jebi Sudan
- Department of Biotechnology, JECRC University, Jaipur, Rajasthan, India
| | - Susheel Sharma
- School of Biotechnology, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (J&K), Jammu, India
| | - Romesh K Salgotra
- School of Biotechnology, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (J&K), Jammu, India
| | - Rajan Kumar Pandey
- Department of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm, Sweden
| | - Deepesh Neelam
- Department of Microbiology, JECRC University, Jaipur, Rajasthan, India
| | - Ravinder Singh
- School of Biotechnology, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (J&K), Jammu, India
| |
Collapse
|
7
|
Coutinho de Almeida R, Tuerlings M, Ramos Y, Den Hollander W, Suchiman E, Lakenberg N, Nelissen RGHH, Mei H, Meulenbelt I. Allelic expression imbalance in articular cartilage and subchondral bone refined genome-wide association signals in osteoarthritis. Rheumatology (Oxford) 2022; 62:1669-1676. [PMID: 36040165 PMCID: PMC10070069 DOI: 10.1093/rheumatology/keac498] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 08/12/2022] [Accepted: 08/18/2022] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVES To present an unbiased approach to identify positional transcript single nucleotide polymorphisms (SNPs) of osteoarthritis (OA) risk loci by allelic expression imbalance (AEI) analyses using RNA sequencing of articular cartilage and subchondral bone from OA patients. METHODS RNA sequencing from 65 articular cartilage and 24 subchondral bone from OA patients was used for AEI analysis. AEI was determined for all genes present in the 100 regions reported by the GWAS catalog that were also expressed in cartilage or bone. The count fraction of the alternative allele (φ) was calculated for each heterozygous individual with the risk-SNP or with the SNP in linkage disequilibrium (LD) with it (r2 > 0.6). Furthermore, a meta-analysis was performed to generate a meta-φ (null hypothesis median φ = 0.49) and P-value for each SNP. RESULTS We identified 30 transcript SNPs (28 in cartilage and 2 in subchondral bone) subject to AEI in 29 genes. Notably, 10 transcript SNPs were located in genes not previously reported in the GWAS catalog, including two long intergenic non-coding RNAs (lincRNAs), MALAT1 (meta-φ = 0.54, FDR = 1.7x10-4) and ILF3-DT (meta-φ = 0.6, FDR = 1.75x10-5). Moreover, 12 drugs were interacting with 7 genes displaying AEI, of which 7 drugs have been already approved. CONCLUSIONS By prioritizing proxy transcript SNPs that mark AEI in cartilage and/or subchondral bone at loci harboring GWAS signals, we present an unbiased approach to identify the most likely functional OA risk-SNP and gene. We identified 10 new potential OA risk genes ready for further, translation towards underlying biological mechanisms.
Collapse
Affiliation(s)
- Rodrigo Coutinho de Almeida
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Margo Tuerlings
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Yolande Ramos
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Wouter Den Hollander
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Eka Suchiman
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Nico Lakenberg
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Rob G H H Nelissen
- Department of Orthopaedics, Leiden University Medical Center, Leiden, The Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Dept. of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Ingrid Meulenbelt
- Department of Biomedical Data Sciences, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
8
|
Yuan X, Ma C, Zhao H, Yang L, Wang S, Xi J. STIC: Predicting Single Nucleotide Variants and Tumor Purity in Cancer Genome. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2692-2701. [PMID: 32086221 DOI: 10.1109/tcbb.2020.2975181] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single nucleotide variant (SNV) plays an important role in cellular proliferation and tumorigenesis in various types of human cancer. Next-generation sequencing (NGS) has provided high-throughput data at an unprecedented resolution to predict SNVs. Currently, there exist many computational methods for either germline or somatic SNV discovery from NGS data, but very few of them are versatile enough to adapt to any situations. In the absence of matched normal samples, the prediction of somatic SNVs from single-tumor samples becomes considerably challenging, especially when the tumor purity is unknown. Here, we propose a new approach, STIC, to predict somatic SNVs and estimate tumor purity from NGS data without matched normal samples. The main features of STIC include: (1) extracting a set of SNV-relevant features on each site and training the BP neural network algorithm on the features to predict SNVs; (2) creating an iterative process to distinguish somatic SNVs from germline ones by disturbing allele frequency; and (3) establishing a reasonable relationship between tumor purity and allele frequencies of somatic SNVs to accurately estimate the purity. We quantitatively evaluate the performance of STIC on both simulation and real sequencing datasets, the results of which indicate that STIC outperforms competing methods.
Collapse
|
9
|
Variant Calling in Next Generation Sequencing Data. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
10
|
Hwang WL, Wolfson RL, Niemierko A, Marcus KJ, DuBois SG, Haas-Kogan D. Clinical Impact of Tumor Mutational Burden in Neuroblastoma. J Natl Cancer Inst 2020; 111:695-699. [PMID: 30307503 DOI: 10.1093/jnci/djy157] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 06/25/2018] [Accepted: 08/08/2018] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Neuroblastoma is the most common pediatric extracranial solid tumor. Within conventional risk groups, there is considerable heterogeneity in outcomes, indicating the need for improved risk stratification. METHODS In this study we analyzed the somatic mutational burden of 515 primary, untreated neuroblastoma tumors from three independent cohorts. Mutations in coding regions were determined by whole-exome/genome sequencing of tumor samples compared to matched blood leukocytes. Survival data for 459 patients were available for analysis of 5-year overall survival using the Kaplan-Meier method and log-rank test. All statistical tests were two-sided. RESULTS Despite a low overall somatic mutational burden (mean = 3, range = 0-56), 107 patients were considered to have high mutational burden (>3 mutations). Unfavorable histology and age 18 months and older were associated with high mutational burden. Patients with high mutational burden had inferior 5-year overall survival (29.0%, 95% confidence interval [CI] = 17.2 to 41.8%) vs those with three or fewer somatic mutations (76.2%, 95% CI = 71.5 to 80.3%) (log-rank P < .001) and this association persisted when limiting the analysis to genes included on a 447-gene panel commonly used in clinical practice. On multivariable analysis, mutational burden remained prognostic independent of age, stage, histology and MYCN status. CONCLUSIONS This study demonstrates that mutational burden of primary neuroblastoma may be useful in combination with conventional risk factors to optimize risk stratification and guide treatment decisions, pending prospective validation.
Collapse
Affiliation(s)
- William L Hwang
- Harvard Radiation Oncology Program, Boston, MA.,Harvard Medical School, Boston, MA
| | | | - Andrzej Niemierko
- Harvard Medical School, Boston, MA.,Department of Radiation Oncology, Massachusetts General Hospital, Boston, MA
| | - Karen J Marcus
- Harvard Medical School, Boston, MA.,Department of Radiation Oncology, Dana-Farber Cancer Institute, Boston, MA.,Department of Radiation Oncology, Brigham & Women's Hospital, Boston, MA
| | - Steven G DuBois
- Harvard Medical School, Boston, MA.,Dana-Farber/Boston Children's Cancer and Blood Disorders Center, Boston, MA
| | - Daphne Haas-Kogan
- Harvard Medical School, Boston, MA.,Department of Radiation Oncology, Dana-Farber Cancer Institute, Boston, MA.,Department of Radiation Oncology, Brigham & Women's Hospital, Boston, MA
| |
Collapse
|
11
|
Vu TN, Nguyen HN, Calza S, Kalari KR, Wang L, Pawitan Y. Cell-level somatic mutation detection from single-cell RNA sequencing. Bioinformatics 2020; 35:4679-4687. [PMID: 31028395 PMCID: PMC6853710 DOI: 10.1093/bioinformatics/btz288] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 03/19/2019] [Accepted: 04/17/2019] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data. RESULTS Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method-called SCmut-to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity. AVAILABILITY AND IMPLEMENTATION The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden
| | - Ha-Nam Nguyen
- Information Technology Institute, Vietnam National University in Hanoi, Hanoi 84024, Vietnam
| | - Stefano Calza
- Department of Molecular and Translational Medicine, University of Brescia, Brescia 25125, Italy
| | - Krishna R Kalari
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Liewei Wang
- Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden
| |
Collapse
|
12
|
Cao C, Mak L, Jin G, Gordon P, Ye K, Long Q. PRESM: personalized reference editor for somatic mutation discovery in cancer genomics. Bioinformatics 2020; 35:1445-1452. [PMID: 30247633 DOI: 10.1093/bioinformatics/bty812] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 08/27/2018] [Accepted: 09/19/2018] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Accurate detection of somatic mutations is a crucial step toward understanding cancer. Various tools have been developed to detect somatic mutations from cancer genome sequencing data by mapping reads to a universal reference genome and inferring likelihoods from complex statistical models. However, read mapping is frequently obstructed by mismatches between germline and somatic mutations on a read and the reference genome. Previous attempts to develop personalized genome tools are not compatible with downstream statistical models for somatic mutation detection. RESULTS We present PRESM, a tool that builds personalized reference genomes by integrating germline mutations into the reference genome. The aforementioned obstacle is circumvented by using a two-step germline substitution procedure, maintaining positional fidelity using an innovative workaround. Reads derived from tumor tissue can be positioned more accurately along a personalized reference than a universal reference due to the reduced genetic distance between the subject (tumor genome) and the target (the personalized genome). Application of PRESM's personalized genome reduced false-positive (FP) somatic mutation calls by as much as 55.5%, and facilitated the discovery of a novel somatic point mutation on a germline insertion in PDE1A, a phosphodiesterase associated with melanoma. Moreover, all improvements in calling accuracy were achieved without parameter optimization, as PRESM itself is parameter-free. Hence, similar increases in read mapping and decreases in the FP rate will persist when PRESM-built genomes are applied to any user-provided dataset. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/precisionomics/PRESM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Cao
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Lauren Mak
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Guangxu Jin
- Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Paul Gordon
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Kai Ye
- Department of Bioinformatics, Electronic and Information Engineering School, Xi'an Jiaotong University, Xi'an, China
| | - Quan Long
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| |
Collapse
|
13
|
Predicting chromosome 1p/19q codeletion by RNA expression profile: a comparison of current prediction models. Aging (Albany NY) 2020; 11:974-985. [PMID: 30710490 PMCID: PMC6382420 DOI: 10.18632/aging.101795] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Accepted: 01/24/2019] [Indexed: 12/20/2022]
Abstract
BACKGROUND Chromosome 1p/19q codeletion is increasingly being recognized as the crucial genetic marker for glioma patients and have been included in WHO classification of glioma in 2016. Fluorescent in situ hybridization, a widely used method in detecting 1p/19q status, has some methodological limitations which might influence the clinical management for doctors. Here, we attempted to explore an RNA sequencing computational method to detect 1p/19q status. METHODS We included 692 samples with 1p/19q status information from TCGA cohort as training set and 222 samples with 1p/19q status information from REMBRANDT cohort as validation set. We reviewed and compared five tools: TSPairs, GSVA, PAM, Caret, smoother, with respect to their accuracy, sensitivity and specificity. RESULTS In TCGA cohort, the GSVA method showed the highest accuracy (98.4%) in predicting 1p/19q status (sensitivity=95.5%, specificity=99.6%) and smoother method showed the second-highest accuracy (accuracy=97.8%, sensitivity=96.4%, specificity=98.3%). While in REMBRANDT cohort, smoother method exhibited the highest accuracy (98.6%) (sensitivity= 96.7%, specificity=98.9%) in 1p/19q status prediction. CONCLUSIONS Our independent assessment of five tools revealed that smoother method was selected as the most stable and accurate method in predicting 1p/19q status. This method could be regarded as a potential alternative method for clinical practice in future.
Collapse
|
14
|
Tam JCW, Chan YM, Tsang SY, Yau CI, Yeung SY, Au KK, Chow CK. Noninvasive prenatal paternity testing by means of SNP-based targeted sequencing. Prenat Diagn 2020; 40:497-506. [PMID: 31674029 PMCID: PMC7154534 DOI: 10.1002/pd.5595] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 10/03/2019] [Accepted: 10/25/2019] [Indexed: 12/19/2022]
Abstract
Objective To develop a method for noninvasive prenatal paternity testing based on targeted sequencing of single nucleotide polymorphisms (SNPs). Method SNPs were selected based on population genetics data. Target‐SNPs in cell‐free DNA extracted from maternal blood (maternal cfDNA) were analyzed by targeted sequencing wherein target enrichment was based on multiplex amplification using QIAseq Targeted DNA Panels with Unique Molecular Identifiers. Fetal SNP genotypes were called using a novel bioinformatics algorithm, and the combined paternity indices (CPIs) and resultant paternity probabilities were calculated. Results Fetal SNP genotypes obtained from targeted sequencing of maternal cfDNA were 100% concordant with those from amniotic fluid‐derived fetal genomic DNA. From an initial panel of 356 target‐SNPs, an average of 148 were included in paternity calculations in 15 family trio cases, generating paternity probabilities of greater than 99.9999%. All paternity results were confirmed by short‐tandem‐repeat analysis. The high specificity of the methodology was validated by successful paternity discrimination between biological fathers and their siblings and by large separations between the CPIs calculated for the biological fathers and those for 60 unrelated men. Conclusion The novel method is highly effective, with substantial improvements over similar approaches in terms of reduced number of target‐SNPs, increased accuracy, and reduced costs.
Collapse
Affiliation(s)
| | - Yee Man Chan
- Department of R&D, Medtimes Medical Group Limited, Kwai Chung, Hong Kong
| | - Shui Ying Tsang
- Department of R&D, Medtimes Medical Group Limited, Kwai Chung, Hong Kong
| | - Chung In Yau
- Department of R&D, Medtimes Medical Group Limited, Kwai Chung, Hong Kong
| | - Shuk Ying Yeung
- Department of R&D, Medtimes Medical Group Limited, Kwai Chung, Hong Kong
| | - Ka Ki Au
- Department of R&D, Medtimes Medical Group Limited, Kwai Chung, Hong Kong
| | - Chun Kin Chow
- Department of R&D, Medtimes Medical Group Limited, Kwai Chung, Hong Kong
| |
Collapse
|
15
|
Mohanty AK, Vuzman D, Francioli L, Cassa C, Toth-Petroczy A, Sunyaev S. novoCaller: a Bayesian network approach for de novo variant calling from pedigree and population sequence data. Bioinformatics 2020; 35:1174-1180. [PMID: 30169785 PMCID: PMC6449753 DOI: 10.1093/bioinformatics/bty749] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 06/19/2018] [Accepted: 08/29/2018] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION De novo mutations (i.e. newly occurring mutations) are a pre-dominant cause of sporadic dominant monogenic diseases and play a significant role in the genetics of complex disorders. De novo mutation studies also inform population genetics models and shed light on the biology of DNA replication and repair. Despite the broad interest, there is room for improvement with regard to the accuracy of de novo mutation calling. RESULTS We designed novoCaller, a Bayesian variant calling algorithm that uses information from read-level data both in the pedigree and in unrelated samples. The method was extensively tested using large trio-sequencing studies, and it consistently achieved over 97% sensitivity. We applied the algorithm to 48 trio cases of suspected rare Mendelian disorders as part of the Brigham Genomic Medicine gene discovery initiative. Its application resulted in a significant reduction in the resources required for manual inspection and experimental validation of the calls. Three de novo variants were found in known genes associated with rare disorders, leading to rapid genetic diagnosis of the probands. Another 14 variants were found in genes that are likely to explain the phenotype, and could lead to novel disease-gene discovery. AVAILABILITY AND IMPLEMENTATION Source code implemented in C++ and Python can be downloaded from https://github.com/bgm-cwg/novoCaller. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anwoy Kumar Mohanty
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dana Vuzman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Laurent Francioli
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Christopher Cassa
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | | | - Agnes Toth-Petroczy
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
16
|
Yin R, Luusua E, Dabrowski J, Zhang Y, Kwoh CK. Tempel: time-series mutation prediction of influenza A viruses via attention-based recurrent neural networks. Bioinformatics 2020; 36:2697-2704. [DOI: 10.1093/bioinformatics/btaa050] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 01/01/2020] [Accepted: 01/22/2020] [Indexed: 02/06/2023] Open
Abstract
Abstract
Motivation
Influenza viruses are persistently threatening public health, causing annual epidemics and sporadic pandemics. The evolution of influenza viruses remains to be the main obstacle in the effectiveness of antiviral treatments due to rapid mutations. The goal of this work is to predict whether mutations are likely to occur in the next flu season using historical glycoprotein hemagglutinin sequence data. One of the major challenges is to model the temporality and dimensionality of sequential influenza strains and to interpret the prediction results.
Results
In this article, we propose an efficient and robust time-series mutation prediction model (Tempel) for the mutation prediction of influenza A viruses. We first construct the sequential training samples with splittings and embeddings. By employing recurrent neural networks with attention mechanisms, Tempel is capable of considering the historical residue information. Attention mechanisms are being increasingly used to improve the performance of mutation prediction by selectively focusing on the parts of the residues. A framework is established based on Tempel that enables us to predict the mutations at any specific residue site. Experimental results on three influenza datasets show that Tempel can significantly enhance the predictive performance compared with widely used approaches and provide novel insights into the dynamics of viral mutation and evolution.
Availability and implementation
The datasets, source code and supplementary documents are available at: https://drive.google.com/drive/folders/15WULR5__6k47iRotRPl3H7ghi3RpeNXH.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rui Yin
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Emil Luusua
- Faculty of Science and Engineering, Linköping University, Linköping, Sweden
| | - Jan Dabrowski
- School of Computer Science, Swansea University, Swansea, UK
| | - Yu Zhang
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| |
Collapse
|
17
|
Adamopoulos PG, Kontos CK, Scorilas A, Sideris DC. Identification of novel alternative transcripts of the human Ribonuclease κ (RNASEK) gene using 3′ RACE and high-throughput sequencing approaches. Genomics 2020; 112:943-951. [DOI: 10.1016/j.ygeno.2019.06.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 05/13/2019] [Accepted: 06/10/2019] [Indexed: 01/25/2023]
|
18
|
Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables. Comput Struct Biotechnol J 2019; 17:561-569. [PMID: 31049166 PMCID: PMC6482431 DOI: 10.1016/j.csbj.2019.04.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 03/12/2019] [Accepted: 04/03/2019] [Indexed: 01/10/2023] Open
Abstract
Deep sequencing genomic analysis is becoming increasingly common in clinical research and practice, enabling accurate identification of diagnostic, prognostic, and predictive determinants. Variant calling, distinguishing between true mutations and experimental errors, is a central task of genomic analysis and often requires sophisticated statistical, computational, and/or heuristic techniques. Although variant callers seek to overcome noise inherent in biological experiments, variant calling can be significantly affected by outside factors including those used to prepare, store, and analyze samples. The goal of this review is to discuss known experimental features, such as sample preparation, library preparation, and sequencing, alongside diverse biological and clinical variables, and evaluate their effect on variant caller selection and optimization.
Collapse
|
19
|
Hwang B, Lee W, Yum SY, Jeon Y, Cho N, Jang G, Bang D. Lineage tracing using a Cas9-deaminase barcoding system targeting endogenous L1 elements. Nat Commun 2019; 10:1234. [PMID: 30874552 PMCID: PMC6420643 DOI: 10.1038/s41467-019-09203-z] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2018] [Accepted: 02/26/2019] [Indexed: 11/30/2022] Open
Abstract
Determining cell lineage and function is critical to understanding human physiology and pathology. Although advances in lineage tracing methods provide new insight into cell fate, defining cellular diversity at the mammalian level remains a challenge. Here, we develop a genome editing strategy using a cytidine deaminase fused with nickase Cas9 (nCas9) to specifically target endogenous interspersed repeat regions in mammalian cells. The resulting mutation patterns serve as a genetic barcode, which is induced by targeted mutagenesis with single-guide RNA (sgRNA), leveraging substitution events, and subsequent read out by a single primer pair. By analyzing interspersed mutation signatures, we show the accurate reconstruction of cell lineage using both bulk cell and single-cell data. We envision that our genetic barcode system will enable fine-resolution mapping of organismal development in healthy and diseased mammalian states.
Collapse
Affiliation(s)
- Byungjin Hwang
- Department of Chemistry, Yonsei University, Seoul, 03722, Korea
| | - Wookjae Lee
- Department of Chemistry, Yonsei University, Seoul, 03722, Korea
| | - Soo-Young Yum
- Laboratory of Theriogenology and Biotechnology, Department of Veterinary Clinical Sciences, College of Veterinary Medicine, the Research Institute of Veterinary Science, and BK21 PLUS Program for Creative Veterinary Science Research, Seoul National University, Seoul, 08826, Korea
| | - Yujin Jeon
- Department of Chemistry, Yonsei University, Seoul, 03722, Korea
| | - Namjin Cho
- Department of Chemistry, Yonsei University, Seoul, 03722, Korea
| | - Goo Jang
- Laboratory of Theriogenology and Biotechnology, Department of Veterinary Clinical Sciences, College of Veterinary Medicine, the Research Institute of Veterinary Science, and BK21 PLUS Program for Creative Veterinary Science Research, Seoul National University, Seoul, 08826, Korea.
| | - Duhee Bang
- Department of Chemistry, Yonsei University, Seoul, 03722, Korea.
| |
Collapse
|
20
|
Stubbs FE, Conway-Campbell BL, Lightman SL. Thirty years of neuroendocrinology: Technological advances pave the way for molecular discovery. J Neuroendocrinol 2019; 31:e12653. [PMID: 30362285 DOI: 10.1111/jne.12653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 10/16/2018] [Accepted: 10/21/2018] [Indexed: 12/12/2022]
Abstract
Since the 1950s, the systems level interactions between the hypothalamus, pituitary and end organs such as the adrenal, thyroid and gonads have been well known; however, it is only over the last three decades that advances in molecular biology and information technology have provided a tremendous expansion of knowledge at the molecular level. Neuroendocrinology has benefitted from developments in molecular genetics, epigenetics and epigenomics, and most recently optogenetics and pharmacogenetics. This has enabled a new understanding of gene regulation, transcription, translation and post-translational regulation, which should help direct the development of drugs to treat neuroendocrine-related diseases.
Collapse
Affiliation(s)
- Felicity E Stubbs
- Henry Wellcome Laboratories for Integrative Neuroscience and Endocrinology, University of Bristol, Bristol, UK
| | - Becky L Conway-Campbell
- Henry Wellcome Laboratories for Integrative Neuroscience and Endocrinology, University of Bristol, Bristol, UK
| | - Stafford L Lightman
- Henry Wellcome Laboratories for Integrative Neuroscience and Endocrinology, University of Bristol, Bristol, UK
| |
Collapse
|
21
|
den Hollander W, Pulyakhina I, Boer C, Bomer N, van der Breggen R, Arindrarto W, Couthino de Almeida R, Lakenberg N, Sentner T, Laros JFJ, ‘t Hoen PAC, Slagboom EPE, Nelissen RGHH, van Meurs J, Ramos YFM, Meulenbelt I. Annotating Transcriptional Effects of Genetic Variants in Disease-Relevant Tissue: Transcriptome-Wide Allelic Imbalance in Osteoarthritic Cartilage. Arthritis Rheumatol 2019; 71:561-570. [PMID: 30298554 PMCID: PMC6593438 DOI: 10.1002/art.40748] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 10/02/2018] [Indexed: 01/10/2023]
Abstract
OBJECTIVE Multiple single-nucleotide polymorphisms (SNPs) conferring susceptibility to osteoarthritis (OA) mark imbalanced expression of positional genes in articular cartilage, reflected by unequally expressed alleles among heterozygotes (allelic imbalance [AI]). We undertook this study to explore the articular cartilage transcriptome from OA patients for AI events to identify putative disease-driving genetic variation. METHODS AI was assessed in 42 preserved and 5 lesioned OA cartilage samples (from the Research Arthritis and Articular Cartilage study) for which RNA sequencing data were available. The count fraction of the alternative alleles among the alternative and reference alleles together (φ) was determined for heterozygous individuals. A meta-analysis was performed to generate a meta-φ and P value for each SNP with a false discovery rate (FDR) correction for multiple comparisons. To further validate AI events, we explored them as a function of multiple additional OA features. RESULTS We observed a total of 2,070 SNPs that consistently marked AI of 1,031 unique genes in articular cartilage. Of these genes, 49 were found to be significantly differentially expressed (fold change <0.5 or >2, FDR <0.05) between preserved and paired lesioned cartilage, and 18 had previously been reported to confer susceptibility to OA and/or related phenotypes. Moreover, we identified notable highly significant AI SNPs in the CRLF1, WWP2, and RPS3 genes that were related to multiple OA features. CONCLUSION We present a framework and resulting data set for researchers in the OA research field to probe for disease-relevant genetic variation that affects gene expression in pivotal disease-affected tissue. This likely includes putative novel compelling OA risk genes such as CRLF1, WWP2, and RPS3.
Collapse
Affiliation(s)
| | - Irina Pulyakhina
- Radboud University Medical Center Nijmegen, The Netherlands, and Wellcome Trust Centre for Human GeneticsOxfordUK
| | - Cindy Boer
- Erasmus Medical CenterRotterdamThe Netherlands
| | - Nils Bomer
- Leiden University Medical CenterLeidenThe Netherlands
| | | | | | | | | | - Thom Sentner
- Leiden University Medical CenterLeidenThe Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Dorri F, Jewell S, Bouchard-Côté A, Shah SP. Somatic mutation detection and classification through probabilistic integration of clonal population information. Commun Biol 2019; 2:44. [PMID: 30729182 PMCID: PMC6355807 DOI: 10.1038/s42003-019-0291-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 12/20/2018] [Indexed: 01/06/2023] Open
Abstract
Somatic mutations are a primary contributor to malignancy in human cells. Accurate detection of mutations is needed to define the clonal composition of tumours whereby clones may have distinct phenotypic properties. Although analysis of mutations over multiple tumour samples from the same patient has the potential to enhance identification of clones, few analytic methods exploit the correlation structure across samples. We posited that incorporating clonal information into joint analysis over multiple samples would improve mutation detection, particularly those with low prevalence. In this paper, we develop a new procedure called MuClone, for detection of mutations across multiple tumour samples of a patient from whole genome or exome sequencing data. In addition to mutation detection, MuClone classifies mutations into biologically meaningful groups and allows us to study clonal dynamics. We show that, on lung and ovarian cancer datasets, MuClone improves somatic mutation detection sensitivity over competing approaches without compromising specificity.
Collapse
MESH Headings
- Female
- Humans
- Carcinoma, Non-Small-Cell Lung/diagnosis
- Carcinoma, Non-Small-Cell Lung/genetics
- Carcinoma, Non-Small-Cell Lung/metabolism
- Carcinoma, Non-Small-Cell Lung/pathology
- Clone Cells
- Cystadenocarcinoma, Serous/diagnosis
- Cystadenocarcinoma, Serous/genetics
- Cystadenocarcinoma, Serous/metabolism
- Cystadenocarcinoma, Serous/pathology
- Datasets as Topic
- Exome
- Gene Expression
- Genetic Loci
- Genome, Human
- Lung Neoplasms/diagnosis
- Lung Neoplasms/genetics
- Lung Neoplasms/metabolism
- Lung Neoplasms/pathology
- Models, Statistical
- Multigene Family
- Mutation
- Neoplasm Proteins/genetics
- Neoplasm Proteins/metabolism
- Ovarian Neoplasms/diagnosis
- Ovarian Neoplasms/genetics
- Ovarian Neoplasms/metabolism
- Ovarian Neoplasms/pathology
- Software
- Whole Genome Sequencing
Collapse
Affiliation(s)
- Fatemeh Dorri
- Department of Computer Science, University of British Columbia, 201- 2366 Main Mall, V6T 1Z4 Vancouver, Canada
| | - Sean Jewell
- Department of Statistics, University of Washington, B313 Padelford Hall, Northeast Stevens Way, Seattle, WA 24105 USA
| | - Alexandre Bouchard-Côté
- Department of Statistics, University of British Columbia, 3182 Earth Sciences Building, 2207 Main Mall, V6T 1Z4 Vancouver, Canada
| | - Sohrab P. Shah
- Department of Molecular Oncology, University of British Columbia, 675 West 10th Avenue, V5Z 1L3 Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Rm. G227 - 2211 Wesbrook Mall, 24105 Vancouver, Canada
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, Kettering Cancer Center, 417 E 68th Street, New York, NY 10065 USA
| |
Collapse
|
23
|
Sohn JI, Nam JW. The present and future of de novo whole-genome assembly. Brief Bioinform 2018; 19:23-40. [PMID: 27742661 DOI: 10.1093/bib/bbw096] [Citation(s) in RCA: 80] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Indexed: 12/15/2022] Open
Abstract
As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome.
Collapse
|
24
|
Lee WK, Lee SG, Yim SH, Kim D, Kim H, Jeong S, Jung SG, Jo YS, Lee J. Whole Exome Sequencing Identifies a Novel Hedgehog-Interacting Protein G516R Mutation in Locally Advanced Papillary Thyroid Cancer. Int J Mol Sci 2018; 19:ijms19102867. [PMID: 30241415 PMCID: PMC6213497 DOI: 10.3390/ijms19102867] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 09/13/2018] [Accepted: 09/18/2018] [Indexed: 12/21/2022] Open
Abstract
Locally advanced thyroid cancer exhibits aggressive clinical features requiring extensive neck dissection. Therefore, it is important to identify changes in the tumor biology before local progression. Here, whole exome sequencing (WES) using tissues from locally advanced papillary thyroid cancer (PTC) presented a large number of single nucleotide variants (SNVs) in the metastatic lymph node (MLN), but not in normal tissues and primary tumors. Among those MLN-specific SNVs, a novel HHIP G516R (G1546A) mutation was also observed. Interestingly, in-depth analysis for exome sequencing data from the primary tumor presented altered nucleotide 'A' at a very low frequency indicating intra-tumor heterogeneity between the primary tumor and MLN. Computational prediction models such as PROVEAN and Polyphen suggested that HHIP G516R might affect protein function and stability. In vitro, HHIP G516R increased cell proliferation and promoted cell migration in thyroid cancer cells. HHIP G516R, a missense mutation, could be a representative example for the intra-tumor heterogeneity of locally advanced thyroid cancer, which can be a potential future therapeutic target for this disease.
Collapse
Affiliation(s)
- Woo Kyung Lee
- Department of Internal Medicine, Yonsei Cancer Center, Severance Hospital, Yonsei University College of Medicine, Seoul 120-752, Korea.
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University, Seoul 120-752, Korea.
| | - Seul Gi Lee
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University, Seoul 120-752, Korea.
- Department of Surgery, Yonsei Cancer Center, Severance Hospital, Yonsei University College of Medicine, Seoul 120-752, Korea.
| | - Seung Hyuk Yim
- Department of Surgery, Yonsei Cancer Center, Severance Hospital, Yonsei University College of Medicine, Seoul 120-752, Korea.
| | - Daham Kim
- Department of Internal Medicine, Yonsei Cancer Center, Severance Hospital, Yonsei University College of Medicine, Seoul 120-752, Korea.
| | - Hyunji Kim
- Department of Surgery, Yonsei Cancer Center, Severance Hospital, Yonsei University College of Medicine, Seoul 120-752, Korea.
| | - Seonhyang Jeong
- Department of Internal Medicine, Yonsei Cancer Center, Severance Hospital, Yonsei University College of Medicine, Seoul 120-752, Korea.
| | - Sang Geun Jung
- Department of Gynecological Oncology, Bundang CHA Medical Center, CHA University, Seongnam, Gyeonggi-do 13496, Korea.
| | - Young Suk Jo
- Department of Internal Medicine, Yonsei Cancer Center, Severance Hospital, Yonsei University College of Medicine, Seoul 120-752, Korea.
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University, Seoul 120-752, Korea.
| | - Jandee Lee
- Department of Surgery, Yonsei Cancer Center, Severance Hospital, Yonsei University College of Medicine, Seoul 120-752, Korea.
| |
Collapse
|
25
|
Li M, Tang L, Liao Z, Luo J, Wu F, Pan Y, Wang J. A novel scaffolding algorithm based on contig error correction and path extension. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:764-773. [PMID: 30040649 DOI: 10.1109/tcbb.2018.2858267] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The sequence assembly process can be divided into three stages: contigs extension, scaffolding, and gap filling. The scaffolding method is an essential step during the process to infer the direction and sequence relationships between the contigs. However, scaffolding still faces the challenges of uneven sequencing depth, genome repetitive regions, and sequencing errors, which often leads to many false relationships between contigs. The performance of scaffolding can be improved by removing potential false conjunctions between contigs. In this study, a novel scaffolding algorithm which is on the basis of path extension Loose-Strict-Loose strategy and contig error correction, called iLSLS. iLSLS helps reduce the false relationships between contigs, and improve the accuracy of subsequent steps. iLSLS utilizes a scoring function, which estimates the correctness of candidate paths by the distribution of paired reads, and try to conduction the extension with the path which is scored the highest. What's more, iLSLS can precisely estimate the gap size. We conduct experiments on two real datasets, and the results show that LSLS strategy is efficient to increase the correctness of scaffolds, and iLSLS performs better than other scaffolding methods.
Collapse
|
26
|
Wolff A, Bayerlová M, Gaedcke J, Kube D, Beißbarth T. A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells. PLoS One 2018; 13:e0197162. [PMID: 29768462 PMCID: PMC5955523 DOI: 10.1371/journal.pone.0197162] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 04/27/2018] [Indexed: 12/17/2022] Open
Abstract
Background Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. Methods Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. Results The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat’s overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21–0.29 and 0.34–0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. Conclusion In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.
Collapse
Affiliation(s)
- Alexander Wolff
- Dept. of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Michaela Bayerlová
- Dept. of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Jochen Gaedcke
- Dept. of General-, Visceral- and Pediatric Surgery, University Medical Center Göttingen, Göttingen, Germany
| | - Dieter Kube
- Dept. of Hematology and Oncology, University Medical Center Göttingen, Göttingen, Germany
| | - Tim Beißbarth
- Dept. of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
- * E-mail:
| |
Collapse
|
27
|
Ricke DO, Shcherbina A, Michaleas A, Fremont‐Smith P. Grigora
SNP
s: Optimized Analysis of
SNP
s for
DNA
Forensics,. J Forensic Sci 2018; 63:1841-1845. [DOI: 10.1111/1556-4029.13794] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2017] [Revised: 02/22/2018] [Accepted: 02/28/2018] [Indexed: 12/30/2022]
Affiliation(s)
- Darrell O. Ricke
- Bioengineering Systems & Technologies Massachusetts Institute of Technology Lincoln Laboratory 244 Wood Street Lexington MA 02421‐6426
| | - Anna Shcherbina
- Bioengineering Systems & Technologies Massachusetts Institute of Technology Lincoln Laboratory 244 Wood Street Lexington MA 02421‐6426
| | - Adam Michaleas
- Bioengineering Systems & Technologies Massachusetts Institute of Technology Lincoln Laboratory 244 Wood Street Lexington MA 02421‐6426
| | - Philip Fremont‐Smith
- Bioengineering Systems & Technologies Massachusetts Institute of Technology Lincoln Laboratory 244 Wood Street Lexington MA 02421‐6426
| |
Collapse
|
28
|
Grassi E, Durante S, Astolfi A, Tarantino G, Indio V, Freier E, Vecchiarelli S, Ricci C, Casadei R, Formica F, Filippini D, Comito F, Serra C, Santini D, D' Errico A, Minni F, Biasco G, Di Marco M. Mutational burden of resectable pancreatic cancer, as determined by whole transcriptome and whole exome sequencing, predicts a poor prognosis. Int J Oncol 2018; 52:1972-1980. [PMID: 29620163 DOI: 10.3892/ijo.2018.4344] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 02/28/2018] [Indexed: 11/05/2022] Open
Abstract
Despite the genomic characterization of pancreatic cancer (PC), marked advances in the development of prognosis classification and novel therapeutic strategies have yet to come. The present study aimed to better understand the genomic alterations associated with the invasive phenotype of PC, in order to improve patient selection for treatment options. A total of 30 PC samples were analysed by either whole transcriptome (9 samples) or exome sequencing (21 samples) on an Illumina platform (75X2 or 100X2 bp), and the results were matched with normal DNA to identify somatic events. Single nucleotide variants and insertions and deletions were annotated using public databases, and the pathogenicity of the identified variants was defined according to prior knowledge and mutation-prediction tools. A total of 43 recurrently altered genes were identified, which were involved in numerous pathways, including chromatin remodelling and DNA damage repair. In addition, an analysis limited to a subgroup of early stage patients (50% of samples) demonstrated that poor prognosis was significantly associated with a higher number of known PC mutations (P=0.047). Samples from patients with a better overall survival (>25 months) harboured an average of 24 events, whereas samples from patients with an overall survival of <25 months presented an average of 40 mutations. These findings indicated that a complex genetic profile in the early stage of disease may be associated with increased aggressiveness, thus suggesting an urgent requirement for an innovative approach to classify this disease.
Collapse
Affiliation(s)
- Elisa Grassi
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Sandra Durante
- Interdepartmental Center of Cancer Research University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Annalisa Astolfi
- Interdepartmental Center of Cancer Research University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Giuseppe Tarantino
- Interdepartmental Center of Cancer Research University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Valentina Indio
- Interdepartmental Center of Cancer Research University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Eva Freier
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Silvia Vecchiarelli
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Claudio Ricci
- Department of Medical and Surgical Sciences, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Riccardo Casadei
- Department of Medical and Surgical Sciences, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Francesca Formica
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Daria Filippini
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Francesca Comito
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Carla Serra
- Department of Internal Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Donatella Santini
- Department of Pathology, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Antonietta D' Errico
- Department of Pathology, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Francesco Minni
- Department of Medical and Surgical Sciences, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Guido Biasco
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| | - Mariacristina Di Marco
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Sant' Orsola-Malpighi Hospital, I-40138 Bologna, Italy
| |
Collapse
|
29
|
Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput Struct Biotechnol J 2018; 16:15-24. [PMID: 29552334 PMCID: PMC5852328 DOI: 10.1016/j.csbj.2018.01.003] [Citation(s) in RCA: 153] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 01/20/2018] [Accepted: 01/28/2018] [Indexed: 02/06/2023] Open
Abstract
Detection of somatic mutations holds great potential in cancer treatment and has been a very active research field in the past few years, especially since the breakthrough of the next-generation sequencing technology. A collection of variant calling pipelines have been developed with different underlying models, filters, input data requirements, and targeted applications. This review aims to enumerate these unique features of the state-of-the-art variant callers, in the hope to provide a practical guide for selecting the appropriate pipeline for specific applications. We will focus on the detection of somatic single nucleotide variants, ranging from traditional variant callers based on whole genome or exome sequencing of paired tumor-normal samples to recent low-frequency variant callers designed for targeted sequencing protocols with unique molecular identifiers. The variant callers have been extensively benchmarked with inconsistent performances across these studies. We will review the reference materials, datasets, and performance metrics that have been used in the benchmarking studies. In the end, we will discuss emerging trends and future directions of the variant calling algorithms.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland 21703, USA
| |
Collapse
|
30
|
Oh S, Flynn RA, Floor SN, Purzner J, Martin L, Do BT, Schubert S, Vaka D, Morrissy S, Li Y, Kool M, Hovestadt V, Jones DTW, Northcott PA, Risch T, Warnatz HJ, Yaspo ML, Adams CM, Leib RD, Breese M, Marra MA, Malkin D, Lichter P, Doudna JA, Pfister SM, Taylor MD, Chang HY, Cho YJ. Medulloblastoma-associated DDX3 variant selectively alters the translational response to stress. Oncotarget 2018; 7:28169-82. [PMID: 27058758 PMCID: PMC5053718 DOI: 10.18632/oncotarget.8612] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 03/26/2016] [Indexed: 12/14/2022] Open
Abstract
DDX3X encodes a DEAD-box family RNA helicase (DDX3) commonly mutated in medulloblastoma, a highly aggressive cerebellar tumor affecting both children and adults. Despite being implicated in several facets of RNA metabolism, the nature and scope of DDX3′s interactions with RNA remain unclear. Here, we show DDX3 collaborates extensively with the translation initiation machinery through direct binding to 5′UTRs of nearly all coding RNAs, specific sites on the 18S rRNA, and multiple components of the translation initiation complex. Impairment of translation initiation is also evident in primary medulloblastomas harboring mutations in DDX3X, further highlighting DDX3′s role in this process. Arsenite-induced stress shifts DDX3 binding from the 5′UTR into the coding region of mRNAs concomitant with a general reduction of translation, and both the shift of DDX3 on mRNA and decreased translation are blunted by expression of a catalytically-impaired, medulloblastoma-associated DDX3R534H variant. Furthermore, despite the global repression of translation induced by arsenite, translation is preserved on select genes involved in chromatin organization in DDX3R534H-expressing cells. Thus, DDX3 interacts extensively with RNA and ribosomal machinery to help remodel the translation landscape in response to stress, while cancer-related DDX3 variants adapt this response to selectively preserve translation.
Collapse
Affiliation(s)
- Sekyung Oh
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA.,Department of Neurosurgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan A Flynn
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Stephen N Floor
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - James Purzner
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA.,Department of Surgery, Division of Neurosurgery, University of Toronto, ON, Canada
| | - Lance Martin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Brian T Do
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Simone Schubert
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Dedeepya Vaka
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Sorana Morrissy
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Surgery, Division of Neurosurgery and Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, ON, Canada
| | - Yisu Li
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC Canada
| | - Marcel Kool
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Volker Hovestadt
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - David T W Jones
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Paul A Northcott
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thomas Risch
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Hans-Jörg Warnatz
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Marie-Laure Yaspo
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Christopher M Adams
- The Vincent Coates Foundation Mass Spectrometry Laboratory, Stanford University, Stanford, CA, USA
| | - Ryan D Leib
- The Vincent Coates Foundation Mass Spectrometry Laboratory, Stanford University, Stanford, CA, USA
| | - Marcus Breese
- Cancer Biology Program, Stanford University School of Medicine, Stanford, CA, USA
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC Canada
| | - David Malkin
- Cancer Genetic Program, The Hospital for Sick Children, Toronto, ON, Canada
| | - Peter Lichter
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jennifer A Doudna
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA.,Department of Chemistry, University of California, Berkeley, CA, USA.,Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
| | - Stefan M Pfister
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michael D Taylor
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Surgery, Division of Neurosurgery and Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, ON, Canada.,Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC Canada.,Department of Laboratory Medicine and Pathobiology, University of Toronto, ON, Canada
| | - Howard Y Chang
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA.,Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Yoon-Jae Cho
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA.,Department of Neurosurgery, Stanford University School of Medicine, Stanford, CA, USA.,Papé Family Pediatric Research Institute, Department of Pediatrics, Oregon Health and Science University, Portland, OR, USA.,Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
| |
Collapse
|
31
|
Cieślik M, Chinnaiyan AM. Cancer transcriptome profiling at the juncture of clinical translation. Nat Rev Genet 2017; 19:93-109. [PMID: 29279605 DOI: 10.1038/nrg.2017.96] [Citation(s) in RCA: 173] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Methodological breakthroughs over the past four decades have repeatedly revolutionized transcriptome profiling. Using RNA sequencing (RNA-seq), it has now become possible to sequence and quantify the transcriptional outputs of individual cells or thousands of samples. These transcriptomes provide a link between cellular phenotypes and their molecular underpinnings, such as mutations. In the context of cancer, this link represents an opportunity to dissect the complexity and heterogeneity of tumours and to discover new biomarkers or therapeutic strategies. Here, we review the rationale, methodology and translational impact of transcriptome profiling in cancer.
Collapse
Affiliation(s)
- Marcin Cieślik
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan
| | - Arul M Chinnaiyan
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan.,Comprehensive Cancer Center, University of Michigan.,Department of Urology, University of Michigan.,Howard Hughes Medical Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
32
|
den Hollander W, Boer CG, Hart DJ, Yau MS, Ramos YFM, Metrustry S, Broer L, Deelen J, Cupples LA, Rivadeneira F, Kloppenburg M, Peters M, Spector TD, Hofman A, Slagboom PE, Nelissen RGHH, Uitterlinden AG, Felson DT, Valdes AM, Meulenbelt I, van Meurs JJB. Genome-wide association and functional studies identify a role for matrix Gla protein in osteoarthritis of the hand. Ann Rheum Dis 2017; 76:2046-2053. [PMID: 28855172 PMCID: PMC5788019 DOI: 10.1136/annrheumdis-2017-211214] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 07/20/2017] [Accepted: 07/31/2017] [Indexed: 12/26/2022]
Abstract
OBJECTIVE Osteoarthritis (OA) is the most common form of arthritis and the leading cause of disability in the elderly. Of all the joints, genetic predisposition is strongest for OA of the hand; however, only few genetic risk loci for hand OA have been identified. Our aim was to identify novel genes associated with hand OA and examine the underlying mechanism. METHODS We performed a genome-wide association study of a quantitative measure of hand OA in 12 784 individuals (discovery: 8743, replication: 4011). Genome-wide significant signals were followed up by analysing gene and allele-specific expression in a RNA sequencing dataset (n=96) of human articular cartilage. RESULTS We found two significantly associated loci in the discovery set: at chr12 (p=3.5 × 10-10) near the matrix Gla protein (MGP) gene and at chr12 (p=6.1×10-9) near the CCDC91 gene. The DNA variant near the MGP gene was validated in three additional studies, which resulted in a highly significant association between the MGP variant and hand OA (rs4764133, Betameta=0.83, Pmeta=1.8*10-15). This variant is high linkage disequilibrium with a coding variant in MGP, a vitamin K-dependent inhibitor of cartilage calcification. Using RNA sequencing data from human primary cartilage tissue (n=96), we observed that the MGP RNA expression of the hand OA risk allele was significantly lowercompared with the MGP RNA expression of the reference allele (40.7%, p<5*10-16). CONCLUSIONS Our results indicate that the association between the MGP variant and increased risk for hand OA is caused by a lower expression of MGP, which may increase the burden of hand OA by decreased inhibition of cartilage calcification.
Collapse
Affiliation(s)
- Wouter den Hollander
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Cindy G Boer
- Department of Internal Medicine, Genetic Laboratory, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Deborah J Hart
- Department of Twin Research and Genetic Epidemiology, King’s College London, London, UK
| | - Michelle S Yau
- Institute for Aging Research, Hebrew SeniorLife, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Clinical Epidemiology Research and Training Unit, Boston University School of Medicine, Boston, Massachusetts, USA
| | - Yolande F M Ramos
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Sarah Metrustry
- Department of Twin Research and Genetic Epidemiology, King’s College London, London, UK
| | - Linda Broer
- Department of Internal Medicine, Genetic Laboratory, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Joris Deelen
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
- Max Planck Institute for Biology of Ageing, Cologne, Germany
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Fernando Rivadeneira
- Department of Internal Medicine, Genetic Laboratory, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Margreet Kloppenburg
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
| | - Marjolein Peters
- Department of Internal Medicine, Genetic Laboratory, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Tim D Spector
- Department of Twin Research and Genetic Epidemiology, King’s College London, London, UK
| | - Albert Hofman
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - P Eline Slagboom
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Rob G H H Nelissen
- Department of Orthopedics, Leiden University Medical Center, Leiden, The Netherlands
| | - André G Uitterlinden
- Department of Internal Medicine, Genetic Laboratory, Erasmus Medical Center, Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - David T Felson
- Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester, UK
| | - Ana M Valdes
- School of Medicine, University of Nottingham, Nottingham, UK
| | - Ingrid Meulenbelt
- Department of Medical Statistics and Bioinformatics, Section Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Joyce J B van Meurs
- Department of Internal Medicine, Genetic Laboratory, Erasmus Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
33
|
Bohnert R, Vivas S, Jansen G. Comprehensive benchmarking of SNV callers for highly admixed tumor data. PLoS One 2017; 12:e0186175. [PMID: 29020110 PMCID: PMC5636151 DOI: 10.1371/journal.pone.0186175] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 09/26/2017] [Indexed: 12/30/2022] Open
Abstract
Precision medicine attempts to individualize cancer therapy by matching tumor-specific genetic changes with effective targeted therapies. A crucial first step in this process is the reliable identification of cancer-relevant variants, which is considerably complicated by the impurity and heterogeneity of clinical tumor samples. We compared the impact of admixture of non-cancerous cells and low somatic allele frequencies on the sensitivity and precision of 19 state-of-the-art SNV callers. We studied both whole exome and targeted gene panel data and up to 13 distinct parameter configurations for each tool. We found vast differences among callers. Based on our comprehensive analyses we recommend joint tumor-normal calling with MuTect, EBCall or Strelka for whole exome somatic variant calling, and HaplotypeCaller or FreeBayes for whole exome germline calling. For targeted gene panel data on a single tumor sample, LoFreqStar performed best. We further found that tumor impurity and admixture had a negative impact on precision, and in particular, sensitivity in whole exome experiments. At admixture levels of 60% to 90% sometimes seen in pathological biopsies, sensitivity dropped significantly, even when variants were originally present in the tumor at 100% allele frequency. Sensitivity to low-frequency SNVs improved with targeted panel data, but whole exome data allowed more efficient identification of germline variants. Effective somatic variant calling requires high-quality pathological samples with minimal admixture, a consciously selected sequencing strategy, and the appropriate variant calling tool with settings optimized for the chosen type of data.
Collapse
|
34
|
Sloma I, Mitjavila-Garcia MT, Feraud O, Griscelli F, Oudrhiri N, El Marsafy S, Gobbo E, Divers D, Proust A, Smadja DM, Desterke C, Carles A, Ma Y, Hirst M, Marra MA, Eaves CJ, Bennaceur-Griscelli A, Turhan AG. Whole-genome analysis reveals unexpected dynamics of mutant subclone development in a patient with JAK2-V617F-positive chronic myeloid leukemia. Exp Hematol 2017; 53:48-58. [DOI: 10.1016/j.exphem.2017.05.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 05/20/2017] [Accepted: 05/22/2017] [Indexed: 01/17/2023]
|
35
|
Wu SH, Schwartz RS, Winter DJ, Conrad DF, Cartwright RA. Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions. Bioinformatics 2017; 33:2322-2329. [PMID: 28334373 PMCID: PMC5860108 DOI: 10.1093/bioinformatics/btx133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 01/22/2017] [Accepted: 03/07/2017] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. RESULTS We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. AVAILABILITY AND IMPLEMENTATION Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). CONTACT cartwright@asu.edu. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Steven H Wu
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rachel S Schwartz
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- Department of Biological Sciences, The University of Rhode Island, Kingston, RI, USA
| | - David J Winter
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Donald F Conrad
- Department of Genetics, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
36
|
Chen W, Robertson AJ, Ganesamoorthy D, Coin LJM. sCNAphase: using haplotype resolved read depth to genotype somatic copy number alterations from low cellularity aneuploid tumors. Nucleic Acids Res 2017; 45:e34. [PMID: 27903916 PMCID: PMC5389684 DOI: 10.1093/nar/gkw1086] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 10/26/2016] [Indexed: 02/03/2023] Open
Abstract
Accurate identification of copy number alterations is an essential step in understanding the events driving tumor progression. While a variety of algorithms have been developed to use high-throughput sequencing data to profile copy number changes, no tool is able to reliably characterize ploidy and genotype absolute copy number from tumor samples that contain less than 40% tumor cells. To increase our power to resolve the copy number profile from low-cellularity tumor samples, we developed a novel approach that pre-phases heterozygote germline single nucleotide polymorphisms (SNPs) in order to replace the commonly used ‘B-allele frequency’ with a more powerful ‘parental-haplotype frequency’. We apply our tool—sCNAphase—to characterize the copy number and loss-of-heterozygosity profiles of four publicly available breast cancer cell-lines. Comparisons to previous spectral karyotyping and microarray studies revealed that sCNAphase reliably identified overall ploidy as well as the individual copy number mutations from each cell-line. Analysis of artificial cell-line mixtures demonstrated the capacity of this method to determine the level of tumor cellularity, consistently identify sCNAs and characterize ploidy in samples with as little as 10% tumor cells. This novel methodology has the potential to bring sCNA profiling to low-cellularity tumors, a form of cancer unable to be accurately studied by current methods.
Collapse
Affiliation(s)
- Wenhan Chen
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland, 4072, Australia
| | - Alan J Robertson
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland, 4072, Australia
| | - Devika Ganesamoorthy
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland, 4072, Australia
| | - Lachlan J M Coin
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland, 4072, Australia
| |
Collapse
|
37
|
Are Next-Generation Sequencing Tools Ready for the Cloud? Trends Biotechnol 2017; 35:486-489. [DOI: 10.1016/j.tibtech.2017.03.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Revised: 02/23/2017] [Accepted: 03/03/2017] [Indexed: 11/22/2022]
|
38
|
Sheffield BS, Tessier-Cloutier B, Li-Chang H, Shen Y, Pleasance E, Kasaian K, Li Y, Jones SJM, Lim HJ, Renouf DJ, Huntsman DG, Yip S, Laskin J, Marra M, Schaeffer DF. Personalized oncogenomics in the management of gastrointestinal carcinomas-early experiences from a pilot study. ACTA ACUST UNITED AC 2016; 23:e571-e575. [PMID: 28050146 DOI: 10.3747/co.23.3165] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
BACKGROUND Gastrointestinal carcinomas are genomically complex cancers that are lethal in the metastatic setting. Whole-genome and transcriptome sequencing allow for the simultaneous characterization of multiple oncogenic pathways. METHODS We report 3 cases of metastatic gastrointestinal carcinoma in patients enrolled in the Personalized Onco-Genomics program at the BC Cancer Agency. Real-time genomic profiling was combined with clinical expertise to diagnose a carcinoma of unknown primary, to explore treatment response to bevacizumab in a colorectal cancer, and to characterize an appendiceal adenocarcinoma. RESULTS In the first case, genomic profiling revealed an IDH1 somatic mutation, supporting the diagnosis of cholangiocarcinoma in a malignancy of unknown origin, and further guided therapy by identifying epidermal growth factor receptor amplification. In the second case, a BRAF V600E mutation and wild-type KRAS profile justified the use of targeted therapies to treat a colonic adenocarcinoma. The third case was an appendiceal adenocarcinoma defined by a p53 inactivation; Ras/raf/mek, Akt/mtor, Wnt, and notch pathway activation; and overexpression of ret, erbb2 (her2), erbb3, met, and cell cycle regulators. SUMMARY We show that whole-genome and transcriptome sequencing can be achieved within clinically effective timelines, yielding clinically useful and actionable information.
Collapse
Affiliation(s)
- B S Sheffield
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC
| | - B Tessier-Cloutier
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC
| | - H Li-Chang
- Royal Victoria Regional Health Centre, Department of Pathology and Laboratory Medicine, Barrie, ON
| | - Y Shen
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC
| | - E Pleasance
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC
| | - K Kasaian
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC
| | - Y Li
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC
| | - S J M Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC
| | - H J Lim
- Division of Medical Oncology, BC Cancer Agency, Vancouver, BC
| | - D J Renouf
- Division of Medical Oncology, BC Cancer Agency, Vancouver, BC
| | - D G Huntsman
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC
| | - S Yip
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC
| | - J Laskin
- Division of Medical Oncology, BC Cancer Agency, Vancouver, BC
| | - M Marra
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC.; Department of Medical Genetics, University of British Columbia, Vancouver, BC
| | - D F Schaeffer
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC
| |
Collapse
|
39
|
Zhernakova DV, Deelen P, Vermaat M, van Iterson M, van Galen M, Arindrarto W, van 't Hof P, Mei H, van Dijk F, Westra HJ, Bonder MJ, van Rooij J, Verkerk M, Jhamai PM, Moed M, Kielbasa SM, Bot J, Nooren I, Pool R, van Dongen J, Hottenga JJ, Stehouwer CDA, van der Kallen CJH, Schalkwijk CG, Zhernakova A, Li Y, Tigchelaar EF, de Klein N, Beekman M, Deelen J, van Heemst D, van den Berg LH, Hofman A, Uitterlinden AG, van Greevenbroek MMJ, Veldink JH, Boomsma DI, van Duijn CM, Wijmenga C, Slagboom PE, Swertz MA, Isaacs A, van Meurs JBJ, Jansen R, Heijmans BT, 't Hoen PAC, Franke L. Identification of context-dependent expression quantitative trait loci in whole blood. Nat Genet 2016; 49:139-145. [PMID: 27918533 DOI: 10.1038/ng.3737] [Citation(s) in RCA: 288] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 11/02/2016] [Indexed: 02/07/2023]
Abstract
Genetic risk factors often localize to noncoding regions of the genome with unknown effects on disease etiology. Expression quantitative trait loci (eQTLs) help to explain the regulatory mechanisms underlying these genetic associations. Knowledge of the context that determines the nature and strength of eQTLs may help identify cell types relevant to pathophysiology and the regulatory networks underlying disease. Here we generated peripheral blood RNA-seq data from 2,116 unrelated individuals and systematically identified context-dependent eQTLs using a hypothesis-free strategy that does not require previous knowledge of the identity of the modifiers. Of the 23,060 significant cis-regulated genes (false discovery rate (FDR) ≤ 0.05), 2,743 (12%) showed context-dependent eQTL effects. The majority of these effects were influenced by cell type composition. A set of 145 cis-eQTLs depended on type I interferon signaling. Others were modulated by specific transcription factors binding to the eQTL SNPs.
Collapse
Affiliation(s)
- Daria V Zhernakova
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| | - Patrick Deelen
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands.,University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, the Netherlands
| | - Martijn Vermaat
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Maarten van Iterson
- Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Michiel van Galen
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Wibowo Arindrarto
- Sequence Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
| | - Peter van 't Hof
- Sequence Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
| | - Hailiang Mei
- Sequence Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
| | - Freerk van Dijk
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands.,University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, the Netherlands
| | - Harm-Jan Westra
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Partners Center for Personalized Genetic Medicine, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Marc Jan Bonder
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| | - Jeroen van Rooij
- Department of Internal Medicine, ErasmusMC, Rotterdam, the Netherlands
| | - Marijn Verkerk
- Department of Internal Medicine, ErasmusMC, Rotterdam, the Netherlands
| | - P Mila Jhamai
- Department of Internal Medicine, ErasmusMC, Rotterdam, the Netherlands
| | - Matthijs Moed
- Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Szymon M Kielbasa
- Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Jan Bot
- SURFsara, Amsterdam, the Netherlands
| | | | - René Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
| | - Jouke J Hottenga
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
| | - Coen D A Stehouwer
- Department of Internal Medicine, Maastricht University Medical Center, Maastricht, the Netherlands.,School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Carla J H van der Kallen
- Department of Internal Medicine, Maastricht University Medical Center, Maastricht, the Netherlands.,School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Casper G Schalkwijk
- Department of Internal Medicine, Maastricht University Medical Center, Maastricht, the Netherlands.,School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Alexandra Zhernakova
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| | - Yang Li
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| | - Ettje F Tigchelaar
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| | - Niek de Klein
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| | - Marian Beekman
- Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Joris Deelen
- Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Diana van Heemst
- Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, the Netherlands
| | - Leonard H van den Berg
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Albert Hofman
- Department of Epidemiology, ErasmusMC, Rotterdam, the Netherlands
| | | | - Marleen M J van Greevenbroek
- Department of Internal Medicine, Maastricht University Medical Center, Maastricht, the Netherlands.,School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Jan H Veldink
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
| | - Cornelia M van Duijn
- Genetic Epidemiology Unit, Department of Epidemiology, ErasmusMC, Rotterdam, the Netherlands
| | - Cisca Wijmenga
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| | - P Eline Slagboom
- Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Morris A Swertz
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands.,University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, the Netherlands
| | - Aaron Isaacs
- School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, the Netherlands.,Genetic Epidemiology Unit, Department of Epidemiology, ErasmusMC, Rotterdam, the Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | | | - Rick Jansen
- Department of Psychiatry, VU University Medical Center, Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
| | - Bastiaan T Heijmans
- Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Peter A C 't Hoen
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Lude Franke
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| |
Collapse
|
40
|
Mazloomian A, Meyer IM. Genome-wide identification and characterization of tissue-specific RNA editing events in D. melanogaster and their potential role in regulating alternative splicing. RNA Biol 2016; 12:1391-401. [PMID: 26512413 PMCID: PMC4829317 DOI: 10.1080/15476286.2015.1107703] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
RNA editing is a widespread mechanism that plays a crucial role in diversifying gene products. Its abundance and importance in regulating cellular processes were revealed using new sequencing technologies. The majority of these editing events, however, cannot be associated with regulatory mechanisms. We use tissue-specific high-throughput libraries of D. melanogaster to study RNA editing. We introduce an analysis pipeline that utilises large input data and explicitly captures ADAR's requirement for double-stranded regions. It combines probabilistic and deterministic filters and can identify RNA editing events with a low estimated false positive rate. Analyzing ten different tissue types, we predict 2879 editing sites and provide their detailed characterization. Our analysis pipeline accurately distinguishes genuine editing sites from SNPs and sequencing and mapping artifacts. Our editing sites are 3 times more likely to occur in exons with multiple splicing acceptor/donor sites than in exons with unique splice sites (p-value < 2.10−15). Furthermore, we identify 244 edited regions where RNA editing and alternative splicing are likely to influence each other. For 96 out of these 244 regions, we find evolutionary evidence for conserved RNA secondary-structures near splice sites suggesting a potential regulatory mechanism where RNA editing may alter splicing patterns via changes in local RNA structure.
Collapse
Affiliation(s)
- Alborz Mazloomian
- a Centre for High-Throughput Biology; Department of Computer Science and Department of Medical Genetics ; University of British Columbia ; Vancouver ; BC , Canada
| | - Irmtraud M Meyer
- a Centre for High-Throughput Biology; Department of Computer Science and Department of Medical Genetics ; University of British Columbia ; Vancouver ; BC , Canada
| |
Collapse
|
41
|
An Advanced Model to Precisely Estimate the Cell-Free Fetal DNA Concentration in Maternal Plasma. PLoS One 2016; 11:e0161928. [PMID: 27662469 PMCID: PMC5035032 DOI: 10.1371/journal.pone.0161928] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2016] [Accepted: 08/15/2016] [Indexed: 11/19/2022] Open
Abstract
Background With the speedy development of sequencing technologies, noninvasive prenatal testing (NIPT) has been widely applied in clinical practice for testing for fetal aneuploidy. The cell-free fetal DNA (cffDNA) concentration in maternal plasma is the most critical parameter for this technology because it affects the accuracy of NIPT-based sequencing for fetal trisomies 21, 18 and 13. Several approaches have been developed to calculate the cffDNA fraction of the total cell-free DNA in the maternal plasma. However, most approaches depend on specific single nucleotide polymorphism (SNP) allele information or are restricted to male fetuses. Methods In this study, we present an innovative method to accurately deduce the concentration of the cffDNA fraction using only maternal plasma DNA. SNPs were classified into four maternal-fetal genotype combinations and three boundaries were added to capture effective SNP loci in which the mother was homozygous and the fetus was heterozygous. The median value of the concentration of the fetal DNA fraction was estimated using the effective SNPs. A depth-bias correction was performed using simulated data and corresponding regression equations for adjustments when the depth of the sequencing data was below 100-fold or the cffDNA fraction is less than 10%. Results Using our approach, the median of the relative bias was 0.4% in 18 maternal plasma samples with a median sequencing depth of 125-fold. There was a significant association (r = 0.935) between our estimations and the estimations inferred from the Y chromosome. Furthermore, this approach could precisely estimate a cffDNA fraction as low as 3%, using only maternal plasma DNA at the targeted region with a sequencing depth of 65-fold. We also used PCR instead of parallel sequencing to calculate the cffDNA fraction. There was a significant association (r = 98.2%) between our estimations and those inferred from the Y chromosome.
Collapse
|
42
|
Hao Y, Zhang P, Xuei X, Nakshatri H, Edenberg HJ, Li L, Liu Y. Statistical modeling for sensitive detection of low-frequency single nucleotide variants. BMC Genomics 2016; 17 Suppl 7:514. [PMID: 27556804 PMCID: PMC5001245 DOI: 10.1186/s12864-016-2905-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Sensitive detection of low-frequency single nucleotide variants carries great significance in many applications. In cancer genetics research, tumor biopsies are a mixture of normal and tumor cells from various subpopulations due to tumor heterogeneity. Thus the frequencies of somatic variants from a subpopulation tend to be low. Liquid biopsies, which monitor circulating tumor DNA in blood to detect metastatic potential, also face the challenge of detecting low-frequency variants due to the small percentage of the circulating tumor DNA in blood. Moreover, in population genetics research, although pooled sequencing of a large number of individuals is cost-effective, pooling dilutes the signals of variants from any individual. Detection of low frequency variants is difficult and can be cofounded by sequencing artifacts. Existing methods are limited in sensitivity and mainly focus on frequencies around 2 % to 5 %; most fail to consider differential sequencing artifacts. Results We aimed to push down the frequency detection limit close to the position specific sequencing error rates by modeling the observed erroneous read counts with respect to genomic sequence contexts. 4 distributions suitable for count data modeling (using generalized linear models) were extensively characterized in terms of their goodness-of-fit as well as the performances on real sequencing data benchmarks, which were specifically designed for testing detection of low-frequency variants; two sequencing technologies with significantly different chemistry mechanisms were used to explore systematic errors. We found the zero-inflated negative binomial distribution generalized linear mode is superior to the other models tested, and the advantage is most evident at 0.5 % to 1 % range. This method is also generalizable to different sequencing technologies. Under standard sequencing protocols and depth given in the testing benchmarks, 95.3 % recall and 79.9 % precision for Ion Proton data, 95.6 % recall and 97.0 % precision for Illumina MiSeq data were achieved for SNVs with frequency > = 1 %, while the detection limit is around 0.5 %. Conclusions Our method enables sensitive detection of low-frequency single nucleotide variants across different sequencing platforms and will facilitate research and clinical applications such as pooled sequencing, cancer early detection, prognostic assessment, metastatic monitoring, and relapses or acquired resistance identification. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2905-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yangyang Hao
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Pengyue Zhang
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.,Department of Biostatistics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Xiaoling Xuei
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.,Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Harikrishna Nakshatri
- Department of Surgery, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.,IU Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Howard J Edenberg
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.,Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.,Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Lang Li
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA. .,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA. .,Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA. .,IU Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
| |
Collapse
|
43
|
Huang G, Wang S, Wang X, You N. An empirical Bayes method for genotyping and SNP detection using multi-sample next-generation sequencing data. Bioinformatics 2016; 32:3240-3245. [DOI: 10.1093/bioinformatics/btw409] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 06/20/2016] [Indexed: 12/30/2022] Open
|
44
|
Salama MA, Hassanien AE, Mostafa A. The prediction of virus mutation using neural networks and rough set techniques. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2016; 2016:10. [PMID: 27257410 PMCID: PMC4867776 DOI: 10.1186/s13637-016-0042-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 05/03/2016] [Indexed: 11/10/2022]
Abstract
Viral evolution remains to be a main obstacle in the effectiveness of antiviral treatments. The ability to predict this evolution will help in the early detection of drug-resistant strains and will potentially facilitate the design of more efficient antiviral treatments. Various tools has been utilized in genome studies to achieve this goal. One of these tools is machine learning, which facilitates the study of structure-activity relationships, secondary and tertiary structure evolution prediction, and sequence error correction. This work proposes a novel machine learning technique for the prediction of the possible point mutations that appear on alignments of primary RNA sequence structure. It predicts the genotype of each nucleotide in the RNA sequence, and proves that a nucleotide in an RNA sequence changes based on the other nucleotides in the sequence. Neural networks technique is utilized in order to predict new strains, then a rough set theory based algorithm is introduced to extract these point mutation patterns. This algorithm is applied on a number of aligned RNA isolates time-series species of the Newcastle virus. Two different data sets from two sources are used in the validation of these techniques. The results show that the accuracy of this technique in predicting the nucleotides in the new generation is as high as 75 %. The mutation rules are visualized for the analysis of the correlation between different nucleotides in the same RNA sequence.
Collapse
Affiliation(s)
- Mostafa A Salama
- British University in Egypt (BUE), Cairo, Egypt ; Scientific Research Group in Egypt, (SRGE), Cairo, Egypt
| | - Aboul Ella Hassanien
- Cairo University, Cairo, Egypt ; Scientific Research Group in Egypt, (SRGE), Cairo, Egypt
| | | |
Collapse
|
45
|
Parker JDK, Shen Y, Pleasance E, Li Y, Schein JE, Zhao Y, Moore R, Wegrzyn-Woltosz J, Savage KJ, Weng AP, Gascoyne RD, Jones S, Marra M, Laskin J, Karsan A. Molecular etiology of an indolent lymphoproliferative disorder determined by whole-genome sequencing. Cold Spring Harb Mol Case Stud 2016; 2:a000679. [PMID: 27148583 PMCID: PMC4849852 DOI: 10.1101/mcs.a000679] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
In an attempt to assess potential treatment options, whole-genome and transcriptome sequencing were performed on a patient with an unclassifiable small lymphoproliferative disorder. Variants from genome sequencing were prioritized using a combination of comparative variant distributions in a spectrum of lymphomas, and meta-analyses of gene expression profiling. In this patient, the molecular variants that we believe to be most relevant to the disease presentation most strongly resemble a diffuse large B-cell lymphoma (DLBCL), whereas the gene expression data are most consistent with a low-grade chronic lymphocytic leukemia (CLL). The variant of greatest interest was a predicted NOTCH2-truncating mutation, which has been recently reported in various lymphomas.
Collapse
Affiliation(s)
- Jeremy D K Parker
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Yaoqing Shen
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Erin Pleasance
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Yvonne Li
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Jacqueline E Schein
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Yongjun Zhao
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Richard Moore
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Joanna Wegrzyn-Woltosz
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Kerry J Savage
- Centre for Lymphoid Cancer and Department of Pathology, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Andrew P Weng
- Terry Fox Laboratory and Department of Pathology, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Randy D Gascoyne
- Centre for Lymphoid Cancer and Department of Pathology, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Steven Jones
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Marco Marra
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | - Janessa Laskin
- Department of Medical Oncology, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4E6, Canada
| | - Aly Karsan
- Genome Sciences Centre and Department of Pathology, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| |
Collapse
|
46
|
Monovar: single-nucleotide variant detection in single cells. Nat Methods 2016; 13:505-7. [PMID: 27088313 PMCID: PMC4887298 DOI: 10.1038/nmeth.3835] [Citation(s) in RCA: 105] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 03/18/2016] [Indexed: 12/31/2022]
Abstract
Current variant callers are not suitable for single-cell DNA sequencing, as they do not account for allelic dropout, false-positive errors and coverage nonuniformity. We developed Monovar (https://bitbucket.org/hamimzafar/monovar), a statistical method for detecting and genotyping single-nucleotide variants in single-cell data. Monovar exhibited superior performance over standard algorithms on benchmarks and in identifying driver mutations and delineating clonal substructure in three different human tumor data sets.
Collapse
|
47
|
Zukurov JP, do Nascimento-Brito S, Volpini AC, Oliveira GC, Janini LMR, Antoneli F. Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage. Algorithms Mol Biol 2016; 11:2. [PMID: 26973707 PMCID: PMC4788855 DOI: 10.1186/s13015-016-0064-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 02/25/2016] [Indexed: 12/16/2022] Open
Abstract
Background In this paper we propose a method and discuss its computational implementation as an integrated tool for the analysis of viral genetic diversity on data generated by high-throughput sequencing. The main motivation for this work is to better understand the genetic diversity of viruses with high rates of nucleotide substitution, as HIV-1 and Influenza. Most methods for viral diversity estimation proposed so far are intended to take benefit of the longer reads produced by some next-generation sequencing platforms in order to estimate a population of haplotypes which represent the diversity of the original population. The method proposed here is custom-made to take advantage of the very low error rate and extremely deep coverage per site, which are the main features of some neglected technologies that have not received much attention due to the short length of its reads, which precludes haplotype estimation. This approach allowed us to avoid some hard problems related to haplotype reconstruction (need of long reads, preliminary error filtering and assembly). Results We propose to measure genetic diversity of a viral population through a family of multinomial probability distributions indexed by the sites of the virus genome, each one representing the distribution of nucleic bases per site. Moreover, the implementation of the method focuses on two main optimization strategies: a read mapping/alignment procedure that aims at the recovery of the maximum possible number of short-reads; the inference of the multinomial parameters in a Bayesian framework with smoothed Dirichlet estimation. The Bayesian approach provides conditional probability distributions for the multinomial parameters allowing one to take into account the prior information of the control experiment and providing a natural way to separate signal from noise, since it automatically furnishes Bayesian confidence intervals and thus avoids the drawbacks of preliminary error filtering. Conclusions The methods described in this paper have been implemented as an integrated tool called Tanden (Tool for Analysis of Diversity in Viral Populations) and successfully tested on samples obtained from HIV-1 strain NL4-3 (group M, subtype B) cultivations on primary human cell cultures in many distinct viral propagation conditions. Tanden is written in C# (Microsoft), runs on the Windows operating system, and can be downloaded from: http://tanden.url.ph/.
Collapse
|
48
|
Abstract
Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data.
Collapse
Affiliation(s)
- Theodore Roman
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA. .,Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program in Computational Biology, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA.
| | - Lu Xie
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA. .,Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program in Computational Biology, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA.
| | - Russell Schwartz
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA. .,Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, 15213, PA, USA.
| |
Collapse
|
49
|
A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference. BIOMED RESEARCH INTERNATIONAL 2015; 2015:456479. [PMID: 26539496 PMCID: PMC4619817 DOI: 10.1155/2015/456479] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 12/17/2014] [Indexed: 12/30/2022]
Abstract
High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves.
Collapse
|
50
|
Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]
Abstract
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.
Collapse
Affiliation(s)
- Manonanthini Thangam
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| | - Ramesh Kumar Gopal
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| |
Collapse
|