1
|
De novo detection of somatic mutations in high-throughput single-cell profiling data sets. Nat Biotechnol 2024; 42:758-767. [PMID: 37414936 PMCID: PMC11098751 DOI: 10.1038/s41587-023-01863-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/07/2023] [Indexed: 07/08/2023]
Abstract
Characterization of somatic mutations at single-cell resolution is essential to study cancer evolution, clonal mosaicism and cell plasticity. Here, we describe SComatic, an algorithm designed for the detection of somatic mutations in single-cell transcriptomic and ATAC-seq (assay for transposase-accessible chromatin sequence) data sets directly without requiring matched bulk or single-cell DNA sequencing data. SComatic distinguishes somatic mutations from polymorphisms, RNA-editing events and artefacts using filters and statistical tests parameterized on non-neoplastic samples. Using >2.6 million single cells from 688 single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) data sets spanning cancer and non-neoplastic samples, we show that SComatic detects mutations in single cells accurately, even in differentiated cells from polyclonal tissues that are not amenable to mutation detection using existing methods. Validated against matched genome sequencing and scRNA-seq data, SComatic achieves F1 scores between 0.6 and 0.7 across diverse data sets, in comparison to 0.2-0.4 for the second-best performing method. In summary, SComatic permits de novo mutational signature analysis, and the study of clonal heterogeneity and mutational burdens at single-cell resolution.
Collapse
|
2
|
Specialized replication mechanisms maintain genome stability at human centromeres. Mol Cell 2024; 84:1003-1020.e10. [PMID: 38359824 DOI: 10.1016/j.molcel.2024.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 12/12/2023] [Accepted: 01/19/2024] [Indexed: 02/17/2024]
Abstract
The high incidence of whole-arm chromosome aneuploidy and translocations in tumors suggests instability of centromeres, unique loci built on repetitive sequences and essential for chromosome separation. The causes behind this fragility and the mechanisms preserving centromere integrity remain elusive. We show that replication stress, hallmark of pre-cancerous lesions, promotes centromeric breakage in mitosis, due to spindle forces and endonuclease activities. Mechanistically, we unveil unique dynamics of the centromeric replisome distinct from the rest of the genome. Locus-specific proteomics identifies specialized DNA replication and repair proteins at centromeres, highlighting them as difficult-to-replicate regions. The translesion synthesis pathway, along with other factors, acts to sustain centromere replication and integrity. Prolonged stress causes centromeric alterations like ruptures and translocations, as observed in ovarian cancer models experiencing replication stress. This study provides unprecedented insights into centromere replication and integrity, proposing mechanistic insights into the origins of centromere alterations leading to abnormal cancerous karyotypes.
Collapse
|
3
|
A comprehensive clinically informed map of dependencies in cancer cells and framework for target prioritization. Cancer Cell 2024; 42:301-316.e9. [PMID: 38215750 DOI: 10.1016/j.ccell.2023.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 10/20/2023] [Accepted: 12/15/2023] [Indexed: 01/14/2024]
Abstract
Genetic screens in cancer cell lines inform gene function and drug discovery. More comprehensive screen datasets with multi-omics data are needed to enhance opportunities to functionally map genetic vulnerabilities. Here, we construct a second-generation map of cancer dependencies by annotating 930 cancer cell lines with multi-omic data and analyze relationships between molecular markers and cancer dependencies derived from CRISPR-Cas9 screens. We identify dependency-associated gene expression markers beyond driver genes, and observe many gene addiction relationships driven by gain of function rather than synthetic lethal effects. By combining clinically informed dependency-marker associations with protein-protein interaction networks, we identify 370 anti-cancer priority targets for 27 cancer types, many of which have network-based evidence of a functional link with a marker in a cancer type. Mapping these targets to sequenced tumor cohorts identifies tractable targets in different cancer types. This target prioritization map enhances understanding of gene dependencies and identifies candidate anti-cancer targets for drug development.
Collapse
|
4
|
Predicting response to immune checkpoint blockade therapy among mismatch repair-deficient patients using mutational signatures. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.19.24301236. [PMID: 38293061 PMCID: PMC10827269 DOI: 10.1101/2024.01.19.24301236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Despite the overall efficacy of immune checkpoint blockade (ICB) for mismatch repair deficiency (MMRD) across tumor types, a sizable fraction of patients with MMRD still do not respond to ICB. We performed mutational signature analysis of panel sequencing data (n = 95) from MMRD cases treated with ICB. We discover that T>C-rich single base substitution (SBS) signatures-SBS26 and SBS54 from the COSMIC Mutational Signatures catalog-identify MMRD patients with significantly shorter overall survival. Tumors with a high burden of SBS26 show over-expression and enriched mutations of genes involved in double-strand break repair and other DNA repair pathways. They also display chromosomal instability (CIN), likely related to replication fork instability, leading to copy number losses that trigger immune evasion. SBS54 is associated with transcriptional activity and not with CIN, defining a distinct subtype. Consistently, cancer cell lines with a high burden of SBS26 and SBS54 are sensitive to treatments targeting pathways related to their proposed etiology. Together, our analysis offers an explanation for the heterogeneous responses to ICB among MMRD patients and supports an SBS signature-based predictor as a prognostic biomarker for differential ICB response.
Collapse
|
5
|
The ALT pathway generates telomere fusions that can be detected in the blood of cancer patients. Nat Commun 2024; 15:82. [PMID: 38167290 PMCID: PMC10762111 DOI: 10.1038/s41467-023-44287-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 12/07/2023] [Indexed: 01/05/2024] Open
Abstract
Telomere fusions (TFs) can trigger the accumulation of oncogenic alterations leading to malignant transformation and drug resistance. Despite their relevance in tumour evolution, our understanding of the patterns and consequences of TFs in human cancers remains limited. Here, we characterize the rates and spectrum of somatic TFs across >30 cancer types using whole-genome sequencing data. TFs are pervasive in human tumours with rates varying markedly across and within cancer types. In addition to end-to-end fusions, we find patterns of TFs that we mechanistically link to the activity of the alternative lengthening of telomeres (ALT) pathway. We show that TFs can be detected in the blood of cancer patients, which enables cancer detection with high specificity and sensitivity even for early-stage tumours and cancers of high unmet clinical need. Overall, we report a genomic footprint that enables characterization of the telomere maintenance mechanism of tumours and liquid biopsy analysis.
Collapse
|
6
|
Mismatch repair deficiency is not sufficient to elicit tumor immunogenicity. Nat Genet 2023; 55:1686-1695. [PMID: 37709863 PMCID: PMC10562252 DOI: 10.1038/s41588-023-01499-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 08/07/2023] [Indexed: 09/16/2023]
Abstract
DNA mismatch repair deficiency (MMRd) is associated with a high tumor mutational burden (TMB) and sensitivity to immune checkpoint blockade (ICB) therapy. Nevertheless, most MMRd tumors do not durably respond to ICB and critical questions remain about immunosurveillance and TMB in these tumors. In the present study, we developed autochthonous mouse models of MMRd lung and colon cancer. Surprisingly, these models did not display increased T cell infiltration or ICB response, which we showed to be the result of substantial intratumor heterogeneity of mutations. Furthermore, we found that immunosurveillance shapes the clonal architecture but not the overall burden of neoantigens, and T cell responses against subclonal neoantigens are blunted. Finally, we showed that clonal, but not subclonal, neoantigen burden predicts ICB response in clinical trials of MMRd gastric and colorectal cancer. These results provide important context for understanding immune evasion in cancers with a high TMB and have major implications for therapies aimed at increasing TMB.
Collapse
|
7
|
Profilin 1 deficiency drives mitotic defects and reduces genome stability. Commun Biol 2023; 6:9. [PMID: 36599901 PMCID: PMC9813376 DOI: 10.1038/s42003-022-04392-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 12/20/2022] [Indexed: 01/06/2023] Open
Abstract
Profilin 1-encoded by PFN1-is a small actin-binding protein with a tumour suppressive role in various adenocarcinomas and pagetic osteosarcomas. However, its contribution to tumour development is not fully understood. Using fix and live cell imaging, we report that Profilin 1 inactivation results in multiple mitotic defects, manifested prominently by anaphase bridges, multipolar spindles, misaligned and lagging chromosomes, and cytokinesis failures. Accordingly, next-generation sequencing technologies highlighted that Profilin 1 knock-out cells display extensive copy-number alterations, which are associated with complex genome rearrangements and chromothripsis events in primary pagetic osteosarcomas with Profilin 1 inactivation. Mechanistically, we show that Profilin 1 is recruited to the spindle midzone at anaphase, and its deficiency reduces the supply of actin filaments to the cleavage furrow during cytokinesis. The mitotic defects are also observed in mouse embryonic fibroblasts and mesenchymal cells deriving from a newly generated knock-in mouse model harbouring a Pfn1 loss-of-function mutation. Furthermore, nuclear atypia is also detected in histological sections of mutant femurs. Thus, our results indicate that Profilin 1 has a role in regulating cell division, and its inactivation triggers mitotic defects, one of the major mechanisms through which tumour cells acquire chromosomal instability.
Collapse
|
8
|
|
9
|
Author Correction: Genomic basis for RNA alterations in cancer. Nature 2023; 614:E37. [PMID: 36697831 PMCID: PMC9931574 DOI: 10.1038/s41586-022-05596-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
10
|
Mapping single-cell transcriptomes in the intra-tumoral and associated territories of kidney cancer. Cancer Cell 2022; 40:1583-1599.e10. [PMID: 36423636 PMCID: PMC9767677 DOI: 10.1016/j.ccell.2022.11.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/12/2022] [Accepted: 11/04/2022] [Indexed: 11/24/2022]
Abstract
Tumor behavior is intricately dependent on the oncogenic properties of cancer cells and their multi-cellular interactions. To understand these dependencies within the wider microenvironment, we studied over 270,000 single-cell transcriptomes and 100 microdissected whole exomes from 12 patients with kidney tumors, prior to validation using spatial transcriptomics. Tissues were sampled from multiple regions of the tumor core, the tumor-normal interface, normal surrounding tissues, and peripheral blood. We find that the tissue-type location of CD8+ T cell clonotypes largely defines their exhaustion state with intra-tumoral spatial heterogeneity that is not well explained by somatic heterogeneity. De novo mutation calling from single-cell RNA-sequencing data allows us to broadly infer the clonality of stromal cells and lineage-trace myeloid cell development. We report six conserved meta-programs that distinguish tumor cell function, and find an epithelial-mesenchymal transition meta-program highly enriched at the tumor-normal interface that co-localizes with IL1B-expressing macrophages, offering a potential therapeutic target.
Collapse
|
11
|
Efficient and flexible Integration of variant characteristics in rare variant association studies using integrated nested Laplace approximation. PLoS Comput Biol 2021; 17:e1007784. [PMID: 33606672 PMCID: PMC7928502 DOI: 10.1371/journal.pcbi.1007784] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/03/2021] [Accepted: 01/04/2021] [Indexed: 12/02/2022] Open
Abstract
Rare variants are thought to play an important role in the etiology of complex diseases and may explain a significant fraction of the missing heritability in genetic disease studies. Next-generation sequencing facilitates the association of rare variants in coding or regulatory regions with complex diseases in large cohorts at genome-wide scale. However, rare variant association studies (RVAS) still lack power when cohorts are small to medium-sized and if genetic variation explains a small fraction of phenotypic variance. Here we present a novel Bayesian rare variant Association Test using Integrated Nested Laplace Approximation (BATI). Unlike existing RVAS tests, BATI allows integration of individual or variant-specific features as covariates, while efficiently performing inference based on full model estimation. We demonstrate that BATI outperforms established RVAS methods on realistic, semi-synthetic whole-exome sequencing cohorts, especially when using meaningful biological context, such as functional annotation. We show that BATI achieves power above 70% in scenarios in which competing tests fail to identify risk genes, e.g. when risk variants in sum explain less than 0.5% of phenotypic variance. We have integrated BATI, together with five existing RVAS tests in the 'Rare Variant Genome Wide Association Study' (rvGWAS) framework for data analyzed by whole-exome or whole genome sequencing. rvGWAS supports rare variant association for genes or any other biological unit such as promoters, while allowing the analysis of essential functionalities like quality control or filtering. Applying rvGWAS to a Chronic Lymphocytic Leukemia study we identified eight candidate predisposition genes, including EHMT2 and COPS7A.
Collapse
|
12
|
Dynamics of cell-free tumour DNA correlate with treatment response of head and neck cancer patients receiving radiochemotherapy. Radiother Oncol 2020; 151:182-189. [PMID: 32687856 DOI: 10.1016/j.radonc.2020.07.027] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/09/2020] [Accepted: 07/12/2020] [Indexed: 12/28/2022]
Abstract
PURPOSE Definitive radiochemotherapy (RCTX) with curative intent is one of the standard treatment options in patients with locally advanced head and neck squamous cell carcinoma (HNSCC). Despite this intensive therapy protocol, disease recurrence remains an issue. Therefore, we tested the predictive capacity of liquid biopsies as a novel biomarker during RCTX in patients with HNSCC. MATERIAL AND METHODS We sequenced the tumour samples of 20 patients with locally advanced HNSCC to identify driver mutations. Subsequently, we performed a longitudinal analysis of circulating tumour DNA (ctDNA) dynamics during RCTX. Deep sequencing and UMI-based error suppression for the identification of driver mutations and HPV levels in the plasma enabled treatment-response monitoring prior, during and after RCTX. RESULTS In 85% of all patients ctDNA was detectable, showing a significant correlation with the gross tumour volume (p-value 0.032). Additionally, the tumour allele fraction in the plasma was negatively correlated with the course of treatment (p-value <0.05). If ctDNA was detectable at the first follow-up, disease recurrence was seen later on. Circulating HPV DNA (cvDNA) could be detected in three patients at high levels, showing a similar dynamic behaviour to the ctDNA throughout treatment, and disappeared after treatment. CONCLUSIONS Monitoring RCTX treatment-response using liquid biopsy in patients with locally advanced HNSCC is feasible. CtDNA can be seen as a surrogate marker of disease burden, tightly correlating with the gross tumour volume prior to the treatment start. The observed kinetic of ctDNA and cvDNA showed a negative correlation with time and treatment dosage in most patients.
Collapse
|
13
|
The rate and spectrum of mosaic mutations during embryogenesis revealed by RNA sequencing of 49 tissues. Genome Med 2020; 12:49. [PMID: 32460841 PMCID: PMC7254727 DOI: 10.1186/s13073-020-00746-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 05/08/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Mosaic mutations acquired during early embryogenesis can lead to severe early-onset genetic disorders and cancer predisposition, but are often undetectable in blood samples. The rate and mutational spectrum of embryonic mosaic mutations (EMMs) have only been studied in few tissues, and their contribution to genetic disorders is unknown. Therefore, we investigated how frequent mosaic mutations occur during embryogenesis across all germ layers and tissues. METHODS Mosaic mutation detection in 49 normal tissues from 570 individuals (Genotype-Tissue Expression (GTEx) cohort) was performed using a newly developed multi-tissue, multi-individual variant calling approach for RNA-seq data. Our method allows for reliable identification of EMMs and the developmental stage during which they appeared. RESULTS The analysis of EMMs in 570 individuals revealed that newborns on average harbor 0.5-1 EMMs in the exome affecting multiple organs (1.3230 × 10-8 per nucleotide per individual), a similar frequency as reported for germline de novo mutations. Our multi-tissue, multi-individual study design allowed us to distinguish mosaic mutations acquired during different stages of embryogenesis and adult life, as well as to provide insights into the rate and spectrum of mosaic mutations. We observed that EMMs are dominated by a mutational signature associated with spontaneous deamination of methylated cytosines and the number of cell divisions. After birth, cells continue to accumulate somatic mutations, which can lead to the development of cancer. Investigation of the mutational spectrum of the gastrointestinal tract revealed a mutational pattern associated with the food-borne carcinogen aflatoxin, a signature that has so far only been reported in liver cancer. CONCLUSIONS In summary, our multi-tissue, multi-individual study reveals a surprisingly high number of embryonic mosaic mutations in coding regions, implying novel hypotheses and diagnostic procedures for investigating genetic causes of disease and cancer predisposition.
Collapse
|
14
|
Abstract
Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1-3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10-18.
Collapse
|
15
|
Single Molecule Molecular Inversion Probes for High Throughput Germline Screenings in Dystonia. Front Neurol 2020; 10:1332. [PMID: 31920950 PMCID: PMC6930228 DOI: 10.3389/fneur.2019.01332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 12/02/2019] [Indexed: 11/26/2022] Open
Abstract
Background: This study's aim was to investigate a large cohort of dystonia patients for pathogenic and rare variants in the ATM gene, making use of a new, cost-efficient enrichment technology for NGS-based screening. Methods: Single molecule Molecular Inversion Probes (smMIPs) were used for targeted enrichment and sequencing of all protein coding exons and exon-intron boundaries of the ATM gene in 373 dystonia patients and six positive controls with known ATM variants. Additionally, a rare-variant association study was performed. Results: One patient (0.3%) was compound heterozygous and 21 others were carriers of variants of unknown significance (VUS) in the ATM gene. Although mutations in sporadic dystonia patients are not common, exclusion of pathogenic variants is crucial to recognize a potential tumor predisposition syndrome. SmMIPs produced similar results as routinely used NGS-based approaches. Conclusion: Our results underline the importance of implementing ATM in the routine genetic testing of dystonia patients and confirm the reliability of smMIPs and their usability for germline screenings in rare neurodegenerative conditions.
Collapse
|
16
|
eDiVA-Classification and prioritization of pathogenic variants for clinical diagnostics. Hum Mutat 2019; 40:865-878. [PMID: 31026367 PMCID: PMC6767450 DOI: 10.1002/humu.23772] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 04/17/2019] [Accepted: 04/24/2019] [Indexed: 01/06/2023]
Abstract
Mendelian diseases have shown to be an and efficient model for connecting genotypes to phenotypes and for elucidating the function of genes. Whole‐exome sequencing (WES) accelerated the study of rare Mendelian diseases in families, allowing for directly pinpointing rare causal mutations in genic regions without the need for linkage analysis. However, the low diagnostic rates of 20–30% reported for multiple WES disease studies point to the need for improved variant pathogenicity classification and causal variant prioritization methods. Here, we present the exome Disease Variant Analysis (eDiVA; http://ediva.crg.eu), an automated computational framework for identification of causal genetic variants (coding/splicing single‐nucleotide variants and small insertions and deletions) for rare diseases using WES of families or parent–child trios. eDiVA combines next‐generation sequencing data analysis, comprehensive functional annotation, and causal variant prioritization optimized for familial genetic disease studies. eDiVA features a machine learning‐based variant pathogenicity predictor combining various genomic and evolutionary signatures. Clinical information, such as disease phenotype or mode of inheritance, is incorporated to improve the precision of the prioritization algorithm. Benchmarking against state‐of‐the‐art competitors demonstrates that eDiVA consistently performed as a good or better than existing approach in terms of detection rate and precision. Moreover, we applied eDiVA to several familial disease cases to demonstrate its clinical applicability.
Collapse
|
17
|
Allele balance bias identifies systematic genotyping errors and false disease associations. Hum Mutat 2018; 40:115-126. [PMID: 30353964 PMCID: PMC6587442 DOI: 10.1002/humu.23674] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 09/17/2018] [Accepted: 10/20/2018] [Indexed: 12/13/2022]
Abstract
In recent years, next‐generation sequencing (NGS) has become a cornerstone of clinical genetics and diagnostics. Many clinical applications require high precision, especially if rare events such as somatic mutations in cancer or genetic variants causing rare diseases need to be identified. Although random sequencing errors can be modeled statistically and deep sequencing minimizes their impact, systematic errors remain a problem even at high depth of coverage. Understanding their source is crucial to increase precision of clinical NGS applications. In this work, we studied the relation between recurrent biases in allele balance (AB), systematic errors, and false positive variant calls across a large cohort of human samples analyzed by whole exome sequencing (WES). We have modeled the AB distribution for biallelic genotypes in 987 WES samples in order to identify positions recurrently deviating significantly from the expectation, a phenomenon we termed allele balance bias (ABB). Furthermore, we have developed a genotype callability score based on ABB for all positions of the human exome, which detects false positive variant calls that passed state‐of‐the‐art filters. Finally, we demonstrate the use of ABB for detection of false associations proposed by rare variant association studies. Availability: https://github.com/Francesc-Muyas/ABB.
Collapse
|
18
|
Abstract
Cactophilic Drosophila species provide a valuable model to study gene–environment interactions and ecological adaptation. Drosophila buzzatii and Drosophila mojavensis are two cactophilic species that belong to the repleta group, but have very different geographical distributions and primary host plants. To investigate the genomic basis of ecological adaptation, we sequenced the genome and developmental transcriptome of D. buzzatii and compared its gene content with that of D. mojavensis and two other noncactophilic Drosophila species in the same subgenus. The newly sequenced D. buzzatii genome (161.5 Mb) comprises 826 scaffolds (>3 kb) and contains 13,657 annotated protein-coding genes. Using RNA sequencing data of five life-stages we found expression of 15,026 genes, 80% protein-coding genes, and 20% noncoding RNA genes. In total, we detected 1,294 genes putatively under positive selection. Interestingly, among genes under positive selection in the D. mojavensis lineage, there is an excess of genes involved in metabolism of heterocyclic compounds that are abundant in Stenocereus cacti and toxic to nonresident Drosophila species. We found 117 orphan genes in the shared D. buzzatii–D. mojavensis lineage. In addition, gene duplication analysis identified lineage-specific expanded families with functional annotations associated with proteolysis, zinc ion binding, chitin binding, sensory perception, ethanol tolerance, immunity, physiology, and reproduction. In summary, we identified genetic signatures of adaptation in the shared D. buzzatii–D. mojavensis lineage, and in the two separate D. buzzatii and D. mojavensis lineages. Many of the novel lineage-specific genomic features are promising candidates for explaining the adaptation of these species to their distinct ecological niches.
Collapse
|